Infinitely Repeated Prisoners’ Dilemma and Decomposition

Author

Alex Ralphs

1 Repeated Games

Let’s start with a simple 2x2 game commonly used for repeated games: a prisoner’s dilemma (PD). The PD is used for repeated games as the defect action is no longer dominant when the game is repeated. The payoff matrix is as below Figure 1, where T>R>P>S.

Code

from IPython.display import display, HTML
payoff_matrix = open("payoff_matrix.html").read().format(Action_1='Co-operate',
Action_2='Defect', 
T = 'T',
R = 'R',
S = 'S',
P = 'P')
display(HTML(payoff_matrix))

	Co-operate	Defect
Co-operate	R, R	S, T
Defect	T, S	P, P

Figure 1: Prisoners’ Dilemma Payoff Matrix

One could argue that it is not necessary for R>P, for this game to be considered a PD, but this condition implies that mutual co-operation is better than mutual defection, a generally desired quality. I imagine this is determined by the behavioural component also. Apparently, the ‘iterated’ (repeated) PD requires 2R>T+S such that mutual co-operation is still of greater payoff that alternating co-operation and defection.

The repeated PD allows for many more possible strategies that are also NE. This differs from the one-shot game where defect is dominant. Such strategies are Tit-for-tat (with or without forgiveness), Pavlov (win-stay, lose-switch), and Grim Trigger. The best strategy depends on the strategic breakdown of the population. In certain situations, a particular strategy may be the best strategy but could also be far worse in another situation.

2 Decomposition

Using the decomposition of (Jessie and Saari 2016), the 3 decomposed parts are below:

Code

from IPython.display import display, HTML

payoff_matrix = open("2x2_decomposition_general.html").read().format(Action_1='Co-operate',
Action_2='Defect', 
T = 'T',
R = 'R',
S = 'S',
P = 'P',
eta_1 = r'\frac{T-R}{2}',
eta_2 = r'\frac{P-S}{2}',
beta = r'\frac{T+R-S-P}{4}',
kappa = r'\frac{T+R+S+P}{4}')

display(HTML(payoff_matrix))

	Co-operate	Defect
Co-operate	R, R	S, T	=	$$-\frac{T-R}{2}, -\frac{T-R}{2}$$	$$-\frac{P-S}{2}, \frac{T-R}{2}$$	+
Defect	T, S	P, P	=	$$\frac{T-R}{2}, -\frac{P-S}{2}$$	$$\frac{P-S}{2}, \frac{P-S}{2}$$	+

$$\frac{T+R-S-P}{4}, \frac{T+R-S-P}{4}$$	$$-\frac{T+R-S-P}{4}, \frac{T+R-S-P}{4}$$	+	$$\frac{T+R+S+P}{4}, \frac{T+R+S+P}{4}$$	$$\frac{T+R+S+P}{4}, \frac{T+R+S+P}{4}$$
$$\frac{T+R-S-P}{4}, -\frac{T+R-S-P}{4}$$	$$-\frac{T+R-S-P}{4}, -\frac{T+R-S-P}{4}$$	+	$$\frac{T+R+S+P}{4}, \frac{T+R+S+P}{4}$$	$$\frac{T+R+S+P}{4}, \frac{T+R+S+P}{4}$$

The decomposition of the PD can be represented as 4 numbers: The strategic elements for both outcomes where the row players defects (1)(2), the behavioural element for either outcome where the row player co-operates (3), and any of the kernel elements (4). This is a general product of the decomposition in 2x2 symmetrical games, and all these numbers will be positive. For the strategic elements, the ‘defect’ actions elements were chosen as they are positive (whereas the ‘co-operation’ actions are negative). Likewise, ‘co-operation’ is chosen for the behavioural element, as the PD defines this as being positive. This positive behavioural component is what makes a co-operative strategy what it is (analogous to a Hawk-Dove game). The PD is therefore described as:

\[\eta_1=\frac{T-R}{2}>0 \tag{1}\] \[\eta_2=\frac{P-S}{2}>0 \tag{2}\] \[\beta=\frac{T+R-S-P}{4}>0 \tag{3}\] \[\kappa=\frac{T+R+S+P}{4} \tag{4}\]

The prior condition that 2R>T+S, which ensures that mutual co-operation is better than alternating co-operation and defection, can be written in terms of the decomposition. Each payoff is the sum of a strategic, behavioural and a kernel element:

\[R=-\eta_1+\beta+\kappa \] \[T=\eta_1+\beta+\kappa \] \[S=-\eta_2-\beta+\kappa \]

Therefore:

\[2R=2(-\eta_1+\beta+\kappa)>\eta_1-\eta_2+2\kappa=T+S \] \[\Rightarrow2\beta>3\eta_1-\eta_2 \]

When the game is repeated N number of times, the payoff is simply the sum of the payoffs in all of the realised outcomes.

3 Infinitely Repeated Games

Results from (Dal Bó 2019) suggest that 3 strategies can explain the majority of chosen strategies in infinitely repeated PDs: Always defect (AD); Tit-for-Tat (TFT); and Grim Trigger (Grim). They also find that as the probability of continuation increases, subjects tend to choose shorter punishments (prefer TFT over Grim trigger). Also, a low payoff to mutual co-operation is associated with Suspicious TFT, which starts with defect instead of co-operate.

Their experimental design did not actually infinitely repeat the PD (as this would be impossible), but instead randomly terminated the repeating process. This is common, as participants do not know the end point, thus should treat every round the same, just as in an infinitely repeated game.

They had 2 phases: in phase 1 the participants played the game as usual; in phase 2 the participants were asked to specify a strategy. Phase 2 is where they elicited strategies from them, and the method to which then did this varied between the sessions. In one session the would be asked to make an initial decision (C or D), and then a second decision conditional on the 4 possible outcomes of the first game (e.g. if (C, C) then C, if (C, D) then D). These are ‘memory-one’ strategies, as they only look backwards one period. In other session, they could choose more strategies (including ‘memory-one’):

Always Co-op or always Defect
Co-op of X rounds, then Defect for the rest
Randomise C and D with X%
Grim Trigger
TFT or STFT
If both co-ordinate, then Co-operate, o/w Defect (WSLS) (Pavlov)
Co-op, but if anyone Defects in first X rounds, then defect (Grim-X)
TFXT - only punish if Defect for X rounds
Co-op, but if anyone Defects in first X rounds, then switch to defect for X rounds, repeat (T-X)
Built your own (Memory One)

The stage game used is shown below. Interestingly they only change the value of R, keeping T, S and P constant. The possible values for R are 32 or 48. They also have the probability of continuation of 0.5, 0.75 and 0.9. A higher value will lead to longer expected supergames. They also use a point system where 100 points is equal to $0.45, thus it may be beneficial to convert the payoffs into the monetary equivalents.

Code

from IPython.display import display, HTML
payoff_matrix = open("payoff_matrix.html").read().format(Action_1='Co-operate',
Action_2='Defect', 
T = 50,
R = 'R',
S = 12,
P = 25)
display(HTML(payoff_matrix))

	Co-operate	Defect
Co-operate	R, R	12, 50
Defect	50, 12	25, 25

A simple table showing the decomposition of these two games is below.

R	$$\eta_1$$	$$\eta_2$$	$$\beta$$	$$\kappa$$
32	9	6.5	11.25	29.75
48	1	6.5	15.25	33.75

The full decomposition of both games is below also.

Code

from IPython.display import display, HTML
T=50
R=32
S=12
P=25
payoff_matrix = open("2x2_decomposition.html").read().format(Action_1='Co-operate',
Action_2='Defect', 
T = T,
R = R,
S = S,
P = P, 
eta_1 = (T-R)/2, 
eta_2 = (P-S)/2,
beta = (T+R-S-P)/4, 
kappa = (T+R+S+P)/4)

display(HTML(payoff_matrix))

	Co-operate	Defect
Co-operate	32, 32	12, 50	=
Defect	50, 12	25, 25	=

-9.0, -9.0	-6.5, 9.0	+	11.25, 11.25	-11.25, 11.25	+	29.75, 29.75	29.75, 29.75
9.0, -6.5	6.5, 6.5	+	11.25, -11.25	-11.25, -11.25	+	29.75, 29.75	29.75, 29.75

Code

from IPython.display import display, HTML
T=50
R=48
S=12
P=25
payoff_matrix = open("2x2_decomposition.html").read().format(Action_1='Co-operate',
Action_2='Defect', 
T = T,
R = R,
S = S,
P = P, 
eta_1 = (T-R)/2, 
eta_2 = (P-S)/2,
beta = (T+R-S-P)/4, 
kappa = (T+R+S+P)/4)

display(HTML(payoff_matrix))

	Co-operate	Defect
Co-operate	48, 48	12, 50	=
Defect	50, 12	25, 25	=

-1.0, -1.0	-6.5, 1.0	+	15.25, 15.25	-15.25, 15.25	+	33.75, 33.75	33.75, 33.75
1.0, -6.5	6.5, 6.5	+	15.25, -15.25	-15.25, -15.25	+	33.75, 33.75	33.75, 33.75

One can see that the increase in R both decreases the strategic element when the opponent plays ‘Co-op’ and increases the behavioural component of playing ‘Co-op’ oneself. This is as expected really, and both would explain some of the data from the authors experiments. The general results from their paper are below Figure 2.

The results show some interesting trends. It seems that a higher value of R will lead to more TFT and Grim Trigger strategies (and more AC), and fewer AD and STFT (and others). Similarly, the higher the probability of continuation the higher the prevalence of TFT and Grim Trigger.

The decomposition would suggest that more C would be played when R is larger, and this is partly reflected by an increase in the prevalence of AC strategies.

4 Nash Equilibria

4.1 Always Defect

It is a Nash equilibrium to defect in every round. This is state in a folk theory, that any NE in the stage game, with be a NE or the extended, and finitely repeated, game. This also make sense through backward induction, as both players would be best off by defecting in the final game, but knowing this will also defect in the game prior, and so on. In no game will it be beneficial to co-operate, thus mutual defection is the NE. For co-operation to be supported under NE, the players need to be unaware of the number of rounds $N$. However, experiments have shown that people do co-operate even when the number of rounds is known, and some of the more common strategies are below.

4.2 Grim Trigger

This is a well-known strategy which aims to infinitely punish bad behaviour. The strategy states that one should co-operate until one’s opponent defects, in which case defect forever after. This can be seen as a less forgiving, and generally under performing, version of tit-for-tat, as a tit-for-tat strategy will only punish for as long as one’s opponent attempt to defect. To find the NE we must compare the payoff of the Grim trigger strategy and a deviation in one round. Suppose that in period $t$ both players have played C and no-one has defected. We can assume this as (see other authors, I ain’t proving this). The total expected payoff, given the probability of continuation of $\delta$, for player $i$ of action $a$ in period $t$, is given by:

\[\Pi(a)=\sum_{t=0}^{\infty} \delta^t \pi_i (a_i^t, a_{-i}^t) \]

Given that both players mutually co-operated last period $(t-1)$ the expected payoffs of co-operating and defecting are:

\[ \Pi(C)=R(1+\delta+\delta^2+\cdots)=\frac{R}{1-\delta} \]

\[ \Pi(D)=T+\delta P(1+\delta+\delta^2+\cdots)=T+\frac{\delta P}{1-\delta} \]

Here it is of course assumed that both players are playing a grim trigger strategy, thus if they play C they will play this forever if it is preferred now as there is an infinite horizon (i.e. decision now is identical to the decision tomorrow if they play C, hence choice will be the same). It isn’t necessary to assume this, as it is implied anyway, but we can also state that the strategy starts with playing C.

Also, if they defect, then their opponent defects next period, thus there is mutual defection forever after the first defection. For the grim trigger to be NE it is necessary (and sufficient) that $\Pi(C)\ge\Pi(D)$. This simplifies to:

\[\frac{R}{1-\delta}\ge T+\frac{\delta P}{1-\delta} \] \[\delta\ge\frac{T-R}{T-P} \]

One can then represent this in terms of the decomposition to understand how the strategic and behavioural components effect this NE:

\[\delta\ge\bar{\delta}_{Grim}=\frac{2\eta_1}{\eta_1-\eta_2+2\beta} \]

This critical value of $\delta_{Grim}$ is both strategic and behavioural, increasing in both strategic element and decreasing in the behavioural component. This implies that as the strategic elements increase (making defect relatively higher paying) the range of values for $\delta$ which are NE is smaller, thereby one would expect fewer Grim trigger strategies to be used. On the other hand, the larger the behavioural component, the smaller $\delta_{Grim}$, which increases the range of possible values of $\delta$ which are a NE. One would then expect more Grim trigger strategies as the behavioural component increases. Intuitively, one could argue that this is because the behavioural component increases R, which is true, but this could be countered by the same increase in T, whereas the behavioural component also reduces P, making the grim trigger a worse punishment, and a more effective deterrent of defection. To see this, take the inequality for the critical value below:

\[R+\frac{\delta R}{1-\delta}\ge T+\frac{\delta P}{1-\delta} \]

The first terms are the payoff in period t of co-operation (R) and defection (T) respectively. The second terms are the expected payoffs that follow from these decisions: R forever if co-operate; and P forever if deviate. Let’s rearrange this such that these effects are one the same side as each other:

\[T-R\le \frac{\delta}{1-\delta}(R-P) \] \[2\eta_1\le\frac{\delta}{1-\delta}(-\eta_1-\eta_2+2\beta) \]

The left-hand side is the initial gain from deviation, and the right-hand side is the expected future gain from co-operation. For NE the right must outweigh the left. If the behavioural component increases, both R and T increase by the same nominal amount, therefore leaving the left-hand side unchanged, they cancel each other out. Thus, behavioural changes are only influencing the future gains of co-operation.

The strategic influence is quite different as is affects both sides: with $\eta_1$ increasing the left and decreasing the right; whist $\eta_2$ only decreases the right. Thus, even though both strategic elements have the same effect on $\bar{\delta}_{Grim}$, $\eta_1$ is more powerful (the effect is larger).

4.3 Tit-for-Tat (TFT)

Generally considered the most robust strategy, the tit-for-tat strategy states that one will co-operate in the first round, and then in each subsequent round, do whatever one’s opponent did in the previous round. Tit-for-Tat is a special can of a limited punishment strategy where the player only punished for 1 period. One can extend the punishment to finite levels or even infinite, which would be a grim trigger.

To find the critical values for the TFT NE one used the same steps used to find the grim trigger NE. Again, the prior choices will have been mutual co-operation, as stated in the TFT strategy. Given this history, the payoff of continuing to co-operate is still R, and will forever be R in NE.

\[\Pi(C)=R(1+\delta+\delta^2+\cdots)=\frac{R}{1-\delta}\]

This is the same on-NE path expected payoff as the grim trigger strategy. The payoff for defecting will be T in the first period, P in the next and then R forever, as the punishment has finished.

\[\Pi(D)=T+\delta P+\delta^2R(1+\delta+\delta^2+\cdots)=T+\delta P+\frac{\delta^2R}{1-\delta} \]

Therefore, for TFT to be a NE this following inequality must be satisfied:

\[\frac{R}{1-\delta}\ge T+\delta P+\frac{\delta^2R}{1-\delta} \] \[R+\delta R+\frac{\delta^2R}{1-\delta}\ge T+\delta P+\frac{\delta^2R}{1-\delta} \] \[\delta\ge\delta_{TFT}=\frac{T-R}{R-P} \]

This was simplified, but one can find the result via the quadratic also. One solution will not be within the rang eon zero to 1, and is disregarded.

\[R\geq T-\delta T+\delta\left(1-\delta\right)P+\delta^2R \] \[\delta^2\left(R-P\right)-\delta\left(T-P\right)+T-R\le0 \] \[\delta=\frac{T-P\pm\sqrt{\left(T-P\right)^2-4\left(R-P\right)\left(T-R\right)}}{2\left(R-P\right)} \] \[\sqrt{\left(T-P\right)^2-4\left(R-P\right)\left(T-R\right)}=\sqrt{T^2-2TP+P^2+4R^2-4RT+4TP-4RP} \] \[=\sqrt{T^2+2TP+P^2+4R^2-4RT-4RP}=\sqrt{\left(T-2R+P\right)^2} \] \[\delta=\frac{T-P\pm\left(T-2R+P\right)}{2\left(R-P\right)} \]

Again, I can insert the decomposition into the critical value:

\[\delta_{TFT}=\frac{2\eta_1}{-\eta_1-\eta_2+2\beta} \]

The implication here is similar to grim trigger: both strategic elements increase the critical value, but $\eta_1$ by more so than $\eta_2$; the behavioural component decreases the critical value. The only difference between the critical value of the TFT and that of grim trigger is in the denominator (R for a T), which means that $\delta_{TFT}>\delta_{Grim}$.

4.4 General Limited Punishment Strategies

This is the general case for grim trigger and Tit-for-Tat, where those two strategies are the extreme versions of a strategy which punishes defection for some positive number of periods.

As with TFT and grim trigger the expected payoff of limited punishment strategies when on the NE path will be mutual co-operation in all periods:

\[\Pi\left(C\right)=R\left(1+\delta+\delta^2+\ldots\right)=\frac{R}{1-\delta} \]

The payoff for defection will be T in the first round, P for the number of rounds of punishment, and R thereafter. One round of punishment is TFT, infinite rounds of punishment is grim trigger. For 2 rounds of punishment the expected payoff is

\[\Pi\left(D\right)=T+\delta P+\delta^2P+\delta^3R\left(1+\delta+\delta^2+\ldots\right)=T+\delta\left(1+\delta\right)P+\frac{\delta^3R}{1-\delta} \]

For 3 rounds of punishment the expected payoff is

\[\Pi\left(D\right)=T+\delta P+\delta^2P+\delta^3P+\delta^3R\left(1+\delta+\delta^2+\ldots\right)=T+\delta\left(1+\delta+\delta^2\right)P+\frac{\delta^4R}{1-\delta} \]

These are just some examples, but this can be generalised. Suppose the number of rounds of punishment is k, then the expected payoff of defection is

\[\Pi\left(D\right)=T+P\sum_{s=1}^{k}\delta^s+\delta^{k+1}R\left(1+\delta+\delta^2+\ldots\right)=T+P\sum_{s=1}^{k}\delta^s+\frac{\delta^{k+1}R}{1-\delta} \]

For the limited punishment to be a NE, then playing C always must be better in expectation that deviating, thus:

\[\frac{R}{1-\delta}\geq T+P\sum_{s=1}^{k}\delta^s+\frac{\delta^{k+1}R}{1-\delta} \] \[R\geq T-\delta T+\left(1-\delta\right)P\sum_{s=1}^{k}\delta^s+\delta^{k+1}R \] \[\delta T-\delta^{k+1}R-\left(1-\delta\right)P\sum_{s=1}^{k}\delta^s\geq T-R \]

When equated to find the critical value, this above equation does not (to my knowledge) have an analytical solution. It can be simplified though to make it easier to find a solution. As the payoffs of both actions are expected to be the same after the punishment, these will cancel out, thus we only need to consider payoffs in the first round and every round of punishment. We can expand the payoff of co-operating slightly:

\[\frac{R}{1-\delta}=R+R\sum_{s=1}^{k}\delta^s+\frac{\delta^{k+1}R}{1-\delta} \]

When substituting into the inequality above, the last term cancels out, thus we have the NE condition of:

\[R+R\sum_{s=1}^{k}\delta^s\geq T+P\sum_{s=1}^{k}\delta^s \] \[\left(R-P\right)\sum_{s=1}^{k}\delta^s\geq T-R \]

Therefore, we have the condition:

\[\sum_{s=1}^{k}\delta^s\geq\frac{T-R}{R-P} \]

It is clear how we recover the TFT condition from this with $k=1$, and then if we let $k\rightarrow\infty$, a little rearranging will give the grim trigger condition also. As the left-hand side is strictly increasing in $k$, then critical values for all limited punishment strategies fall between that of TFT and grim trigger, monotonically increasing in k from TFT to grim trigger.

This can be simplified further as this is just a geometric series $\sum_{s} a_s$ where ${a_{s+k}}/{a_s}=\delta^k$. The general simplification for this is:

\[\sum_{s=1}^{k}\delta^s=\frac{\delta\left(1-\delta^k\right)}{1-\delta} \]

If $k=1$ (TFT) this simplifies to $\delta$, and as $k\rightarrow\infty$ (Grim trigger) this will tend to $\frac{\delta}{1-\delta}$. The condition for NE can now be shown as:

\[\frac{\delta\left(1-\delta^k\right)}{1-\delta}\geq\frac{T-R}{R-P} \]

The solutions of this for $k=1$ and $k\rightarrow\infty$ are simple as terms cancel or tend to zero, but solving for any other integer is analytically difficult, and tends to lead to routes. For example, take $k=2$, the condition is:

\[\frac{\delta\left(1-\delta^2\right)}{1-\delta}=\delta\left(1+\delta\right)\geq\frac{T-R}{R-P} \] \[\Rightarrow\delta^2+\delta-\frac{T-R}{R-P}\geq0 \]

This can be solved:

\[\delta\geq\frac{-1+\sqrt{1+4\frac{T-R}{R-P}}}{2} \]

Whilst this is not too complex and can easily be found, it does involve a square route, and higher powers will include higher powered routes to solve.

If we let $\frac{T-R}{R-P}=X$, then one can rearrange:

\[\frac{\delta\left(1-\delta^k\right)}{1-\delta}\geq X \] \[\delta^{k+1}-\left(X+1\right)\delta+X\le0 \]

To my knowledge, it is not possible to solve this as a function of $k$ (i.e. one cannot solve without first knowing the value of $k$). One can find the solutions using formulas up to $k=3$, but beyond this it is impossible, therefore numeric methods are required.

4.5 Win-Stay Lose-Switch (Pavlov)

This strategy states that the player should stick with what’s working and switch if it doesn’t. Outcomes are considered either a ‘win’ or a ‘loss’: if one co-operates, then mutual co-operation is a win and perseveres, but being defected against is a loss, and will be met with defection in kind (changing action). Defecting against a player who co-operates is also a win, but if they punish you by defecting also, this is a loss, and will be responded to by co-operating again. Therefore, it is easy to find the NE as it will be supported by the same values of $\delta$ as TFT. The reason for this is that the on-strategy path is to always co-operate, and the deviation from this will be a defection, then one round of punishment, and then back to co-operation, the same deviating path as TFT.

\[\delta_{WSLS}=\delta_{TFT}=\frac{T-R}{R-P} \]

5 Subgame Perfect Equilibrium

I have shown under what conditions the previous strategies can be supported as a NE, but this does not ensure they are Subgame Perfect. To be a SPE the strategy must (necessary and sufficient) satisfy the one-deviation property, which states that no player can increase their payoff by changing action for 1 period given any history. Given this definition, not all of the above strategies are SPE. The strategies that are both NE and SPE are AD (for any $\delta$), Grim trigger (for $\delta\ge\delta_{Grim}$), WSLS (for $\delta\ge\delta_{WSLS}$).

5.1 Grim Trigger

The reason Grim Trigger is SPE is as follows. The outcome path will either be (C, C) in every period or (D, D) in every period. For (C, C), the on-path payoff is $\frac{R}{1-\delta}$, and deviating for 1 period gets the player T for one period and P thereafter equalling $T+\frac{\delta P}{1-\delta}$. Therefore, there is no incentive to deviate if $\delta\geq\delta_{Grim}=\frac{T-R}{T-P}$. When the outcome is (D, D) there is no incentive to change to C as it’s worse. Thus, Grim trigger can be a SPE.

5.2 TFT

For TFT, there are 4 possible histories (C, C), (C, D), (D, C) or (D, D). Assume player 2 is playing TFT.

After (C, C), sticking to TFT gives $\frac{R}{1-\delta}$, while deviating to D for one period will get T, but then player 1 goes back to C at the same time as player 2 reacts and defects, so player 1 gets S. They then get in a loop of (D, C) then (C, D) and so on, thus the payoff is:

\[T+\delta S+\delta^2T+\delta^3S+\cdots=\frac{T+\delta S}{1-\delta^2} \]

For SPE we therefore need:

\[\frac{R}{1-\delta}\ge\frac{T+\delta S}{1-\delta^2} \]

\[\delta\ge\frac{T-R}{R-S} \]

After (C, D), sticking to the TFT will see (D, C) then (C, D) and so on, for a payoff of $\frac{T+\delta S}{1-\delta^2}$. If they deviate for only one period (playing C), the outcome will be (C, C) which means that TFT leads to (C, C) forever after. Thus, the payoff of deviation is $\frac{R}{1-\delta}$. Therefore, we need:

\[\delta\le\frac{T-R}{R-S} \]

After (D, C) sticking to TFT again sees the loop but getting S before T, so the payoff is $\frac{\delta T+S}{1-\delta^2}$. Deviating for one period will mean playing D still, and player 2 will have switched to D, so the outcome is (D, D). TFT states that they will both punish each other by playing D, so they play D forever giving the payoff of deviating of $\frac{P}{1-\delta}$. For SPE we need:

\[\frac{\delta T+S}{1-\delta^2}\geq\frac{P}{1-\delta} \] \[\delta\ge\frac{P-S}{T-P} \]

Finally, after (D, D), the TFT path is D forever giving the payoff of $\frac{P}{1-\delta}$. Deviating for one period will give S for a period, but then they get into the (D, C) to (C, D) loop again giving a payoff of $\frac{\delta T+S}{1-\delta^2}$. For SPE we need:

\[\frac{P}{1-\delta}\geq\frac{\delta T+S}{1-\delta^2} \] \[\delta\le\frac{P-S}{T-P} \]

For TFT to be a SPE, all 4 of these conditions must be satisfied, which implies a $\delta$ which satisfies the equation below:

\[\delta_{TFT-SPE}=\frac{T-R}{R-S}=\frac{P-S}{T-P} \]

5.3 Limited Punishment

The solutions to this generally are difficult to find analytically, as the functional form of depends on the number of periods of punishment. The NE condition is below, and highlights this issue

\[\sum_{s=1}^{k}\delta^s\geq\frac{T-R}{R-P} \]

If $k=1$ (TFT) then the function is linear, if $k=2$ the function is quadratic, and so on. The NE requires one to solve a polynomial of order $k$, which is easy when k is small, but increasing difficult, and may lead to multiple solutions as k increases. SPE has a similar issue. I have already found the condition for which TFT is a SPE, which covers $k=1$. Therefore, let $k=2$, and consider the same 4 histories plus a couple more that depend on how many rounds of punishment have passed.

After (C, C) the on-path outcomes will be C forever giving a payoff of $\frac{R}{1-\delta}$. If one deviates for a period this will gain T, but they will be punished to 2 periods after. In the first period of punishment, they get S, but they will the reciprocate the punishment, hence the outcome in the 2^nd punishment period is (D, D), getting P. This resets the punishment from the opponent also, thus they will continually punish each other forever. Thus, the payoff of the deviation is $T+\delta S+\frac{\delta^2P}{1-\delta}$. For the SPE we then need:

\[\frac{R}{1-\delta}\geq T+\delta S+\frac{\delta^2P}{1-\delta} \] \[\delta\left(R-S\right)-\delta^2\left(P-S\right)\geq\left(1-\delta\right)\left(T-R\right) \] \[\left(P-S\right)\delta^2-\left(T-S\right)\delta+\left(T-R\right)\le0 \]

This quadratic can be solved via the quadratic equation but has no simplified solution past this point.

If the opponent is playing D, this means that they are punishing the player, and this can be either the first round or second round. Thus, there are 2 histories here where the opponent plays D: if this is the first round then the history of the previous 2 rounds is either {(D, C), (C, D)}, {(D, C), (D, D)} or {(D, D), (D, D)}; if this is the second round of punishment then the history is {(C, D), (D, D)}.

After {(D, C), (C, D)}, staying on strategy means the next outcome will be (D, D), and they will punish each other forever. The payoff is $\frac{P}{1-\delta}$. Deviation for one period means the next outcome will be (C, D) and payoff is S. Going back to the strategy means the next outcome will be (D, C) for a payoff of T. The next period both players will be punishing each other, but lagged a period, thus we get (D, D) forever. Thus, the payoff of deviation is $S+\delta T+\frac{\delta^2P}{1-\delta}$. Therefore, for SPE we need:

\[\frac{P}{1-\delta}\geq S+\delta T+\frac{\delta^2P}{1-\delta} \] \[\left(T-P\right)\delta^2-\left(T-S\right)\delta+P-S\geq0 \]

This can be solved and simplified:

\[\delta=\frac{T-S\pm\sqrt{\left(T-S\right)^2-4\left(T-P\right)\left(P-S\right)}}{2\left(T-P\right)} \] \[\delta=\frac{T-S\pm\left(T+S-2P\right)}{2\left(T-P\right)} \]

Therefore, we have either $\delta\le\frac{P-S}{T-P}$ or $\delta\geq1$. The second condition is impossible as $\delta<1$, thus for SPE it is necessary that:

\[\delta\le\frac{P-S}{T-P} \]

After {(D, C), (D, D)} the strategy states that the next outcome will be (D, D) again, thus they get P forever. Deviating for a period will lead to (C, D) for a payoff of S. The outcome after this will be (D, C), then (D, D) after that. This is the same as the previous case, so the condition is the same. The same can be said after history {(D, D), (D, D)}, as the strategy will lead to (D, D) forever and deviating will lead to (C, D), (D, C), then (D, D) forever also.

After {(C, D), (D, D)} the next on-strategy outcome will be (D, D) again as the opponent will start a new 2-round punishment. The deviation outcome will be (C, D) for a payoff of S, and the outcome after will be (D, D) forever after. This is strictly worse for any $\delta$ as $S<P$.

The last history to check is the only other outcome where the opponent plays C. The previous (C, C) history requires both players to have always played C, but the outcome (D, C) does not. It does however require the player to have played C previously. The opponent could have played either C or D. Therefore, we need to check both {(C, D), (D, C)} and {(C, C), (D, C)}. The former has been explored somewhat already. After {(C, D), (D, C)} the next on-strategy outcome will be (D, D), as the player punishes for a second period and the opponent reciprocally punishes. This leads to P forever. Deviating for one period will lead to (C, D), then (D, D) forever which is strictly worse.

After {(C, C), (D, C)} the next on-strategy outcome will be (D, D) as the player is punished. This will be reciprocated, and the outcomes will be (D, D) forever. If they deviate the outcome will be (C, D), then (D, D) forever. The deviation is strictly worse again. (This I was unsure about).

In summary, there are only 2 conditions that must be satisfied for $k=2$ to satisfy a SPE:

\[\delta\le\frac{P-S}{T-P} \] \[\left(P-S\right)\delta^2-\left(T-S\right)\delta+\left(T-R\right)\le0 \]

For the payoffs in the paper’s games:

\[\delta\le\frac{13}{25}=0.52 \] \[13\delta^2-38\delta+\left(50-R\right)\le0 \] \[\Rightarrow\frac{38-\sqrt{1444-52\ast\left(50-R\right)}}{26}\le\delta\le\frac{38+\sqrt{1444-52\ast\left(50-R\right)}}{26} \] \[R=32\Rightarrow0.59\le\delta\le2.33 \] \[R=48\Rightarrow0.054\le\delta\le2.87 \]

For $R=32$, there is never an SPE, but for $R=48$ the lower bound is very low, so the SPE is possible.

The lower bound of $\delta$ is determined by:

\[\delta\geq\frac{T-S-\sqrt{\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right)}}{2\left(P-S\right)} \]

It follows that it is necessary that:

\[\frac{T-S-\sqrt{\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right)}}{2\left(P-S\right)}\le\frac{P-S}{T-P} \] \[\frac{\left(T-S\right)\left(T-P\right)-2\left(P-S\right)^2}{\left(T-P\right)}\le\frac{\sqrt{\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right)}}{1} \] \[(T-S)-\frac{2(P-S)^2}{(T-P)}\le\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right) \] \[(T-S)^2-\frac{4(P-S)^4}{(T-P)^2}-4\frac{(T-S)(P-S)^2}{(T-P)}\le\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right) \] \[\frac{(P-S)^4}{(T-P)^2}+\frac{(T-S)(P-S)^2}{(T-P)}-(P-S)(T-R)\ge0 \]

This is pretty intractable at this point, which leads me to believe that higher values of k will be especially intractable and difficult to decipher any SPE from.

5.4 WSLS

Win-Stay Lose-Switch has the same possible histories as TFT, but differing actions thereafter. From (C, C), the on-strategy action to play C forever with a payoff of $\frac{R}{1-\delta}$. If one deviates for one period (Plays D), they will gain T, which is a win, thus they stay with D. The other player lost, so switches to D, so the outcome is (D, D) with a payoff of $\delta P$. They both consider this a loss and both switch to C forever with a payoff of $\frac{\delta^2R}{1-\delta}$. Thus, the payoff of defection for a period is $T+\delta P+\frac{\delta^2R}{1-\delta}$. Therefore, for SPE we need:

\[\frac{R}{1-\delta}\geq T+\delta P+\frac{\delta^2R}{1-\delta} \] \[\delta\geq\frac{T-R}{R-P} \]

After (C, D), WSLS states that one would switch and the opponent stays, so the outcome is (D, D) and payoff is P. Then both switch to C forever and get $\frac{\delta R}{1-\delta}$. If one deviates then they don’t initially switch, the outcome is still (C, D) and they get S. The payoff of deviating for a period is then $S+\delta P+\frac{\delta^2R}{1-\delta}$. Therefore, for SPE we need:

\[P+\frac{\delta R}{1-\delta}\geq S+\delta P+\frac{\delta^2R}{1-\delta} \] \[\delta\geq-\frac{P-S}{R-P} \]

This is always satisfied as $P>S$.

After (D, C), staying on WSLS means staying with D, but one’s opponent will switch, thus the outcome will be (D, D). Then they both switch to C forever, so the payoff will be $P+\frac{\delta R}{1-\delta}$. Deviating for a period means switching also, so the outcome is (C, D), the following steps were explained prior, so the payoff will be $S+\delta P+\frac{\delta^2R}{1-\delta}$. The SPE condition is the same as before, so will be satisfied also.

Finally, after (D, D) if both follow the strategy they will play (C, C) forever, giving the payoff of $\frac{R}{1-\delta}$. If one deviates for a period, the outcome will be (D, C) giving T. In the following period the opponent will switch to D, they payoff will be $\delta P$ and the payoff after they switch to C will be $\frac{\delta^2R}{1-\delta}$. This is the same SPE condition as (C, C), so will be satisfied if that is.

Thus, for WSLS to be a SPE we only need $\delta\geq\frac{T-R}{R-P}$. This is the same condition as the NE condition, therefore is WSLS is a NE, it is also a SPE.

6 Summary of NE and SPE

Strategy	NE	SPE
AD	Yes	Yes
AC	No	No
Grim Trigger	$$\delta\ge\frac{T-R}{T-P} $$	$$\delta\ge\frac{T-R}{T-P} $$
Tit-f0r-Tat	$$\delta\ge\frac{T-R}{R-P}$$	$$\delta=\frac{T-R}{R-S}=\frac{P-S}{T-P}$$
Limited Punishment	$$\frac{\delta(1-\delta^k)}{1-\delta}\ge\frac{T-R}{R-P}$$	For k=2,$$\frac{T-S-\sqrt{(T-S)^2-4(T-P)(P-S)}}{2(P-S)}\le\delta$$ For k>2, I don't know.
Win-Stay-Lose-Switch	$$\delta\ge\frac{T-R}{R-P}$$	$$\delta\ge\frac{T-R}{R-P}$$

Figure 3: Summary of NE and SPE

See Figure 3.

7 Graphing the Critical Values of $\delta$

Not all of these need graphing, and in fact the only interesting thing to graph here is the limited punishment. As it is the general case for TFT and grim trigger, the limited punishment NE solutions lie between these two, depending on the number of periods of punishment. Luckily the equation is a sum of a geometric series, therefore it has a closed form simplification. Thus: \[\frac{\delta(1-\delta^k)}{1-\delta}\ge\frac{T-R}{R-P} \]

The solutions for this can be numerically approximate for increasing values of $k$ (and some appropriately chosen payoffs) and plotted to show the path between TFT and Grim trigger. The above can be written as a polynomial of order $k+1$, where $X=(T-R)/(R-P)$: \[\delta^{k+1}-(X+1)\delta+X\le0 \]

What will help in graphing the critical values that satisfy a LP NE is that the geometric series is strictly increasing in both $\delta$ and $k$, thus as we increase the value of $k$ it must be that the critical value of $\delta$ will be smaller. We also know the extreme values of the critical $\delta$ are at the extreme values of $k$ ($k=1$ & $k=\infty$), thus we know that every critical value of $\delta$ must lie between these two points. Thus, one can state that in general the critical value ($\bar{\delta}$) must satisfy: \[\bar{\delta}\in\bigg[\frac{T-R}{T-P},\frac{T-R}{R-P}\bigg] \]

This means that when attempting to find the solutions to the $(k+1)$-order polynomials one can restrict ourselves to this above interval. As $\bar{\delta}$ is falling in $k$, we can restrict this interval further for the $k^{th}$ $\bar{\delta}$: \[\bar{\delta}_{k+1}\in\bigg[\frac{T-R}{T-P}, \bar{\delta}_k\bigg] \]

The graph below (Figure 4) shows the $\bar\delta$ for NE for an increasing number of punishment periods. It includes a slider for the value of R (restricted above P but less than T). It shows that at low $k$, limited punishment strategies cannot be a NE unless R is sufficiently large. For TFT to be a NE, R needs to be at least 38.

Code

# cSpell:disable

import numpy as np

T = 50
P = 25
K = list(range(1, 21))

def solve(k, R):
    z = k - 1
    X = (T-R) / (R-P)
    zeros = [0] * z
    delta_TFT = X
    delta_GRIM = (T-R) / (T-P)
    error = 0.00000001
    if z == 0:
        delta_crit = delta_TFT
    else:
        coeff = [1, *zeros, -X-1, X]
        roots = np.roots(coeff)
        y = np.iscomplex(roots)
        roots_minus_complex = []
        delta_crit = []
        for j in range(len(coeff)-1):
            if not y[j]:
                roots_minus_complex.append(roots[j].real)
        for j in range(len(roots_minus_complex)):
            if (roots_minus_complex[j] >= delta_GRIM-error) and (roots_minus_complex[j] <= delta_TFT+error):
                delta_crit = roots_minus_complex[j]
    return delta_crit

deltas = []
for j in range(1, int(T-P+1)):
  R = P+j
  delta=[]
  for i in range(len(K)):
    delta.append({'k': i+1, 'delta': solve(K[i], R)})
  deltas.append({'R': R, 'data': delta})
  delta=[]

ojs_define(lp_crit_delta_data = deltas)
# cSpell:enable

Code

import {slider} from '@jashkenas/inputs'
viewof R = slider({
  min:26, 
  max:49,
  step:1,
  value:26,
  title:'R'
})
lp_crit_delta_index=Math.round(R-26)

Code

Plot.plot({
  x: {
    label: "k",
    labelAnchor: 'center',
    line: true,
    domain: [0, lp_crit_delta_data[lp_crit_delta_index]['data'].length]
  },
  y: {
    label: "δ",
    labelAnchor: 'center',
    line: true,
    grid: true,
    domain: [0, Math.max(lp_crit_delta_data[lp_crit_delta_index]['data'][0]['delta'],1)]
  },
  marks: [
    Plot.line(lp_crit_delta_data[lp_crit_delta_index]['data'], 
    {x: 'k', y: 'delta', stroke: 'darkred'}
    ),
    Plot.ruleY([1])
  ]
})

Code

Inputs.table(lp_crit_delta_data[lp_crit_delta_index]['data'], {
  columns: [
    "k",
    "delta"
  ],
  header: {
    x: "k",
    delta: "δ"
  }
})

Figure 4: Limited Punishment Critical NE Discount Rates for k Punishment Periods

8 Using the Decomposition

8.1 Grim Trigger

The condition for Grim trigger to be both a NE and SPE can be expressed using the decomposition as so:

\[\delta\ge\frac{T-R}{T-P}=\frac{2\eta_1}{\eta_1-\eta_2+2\beta} \]

8.2 Tit-for-Tat

Suppose we substituted the decomposed elements in for the payoffs for the critical values for TFT to be a NE:

\[\delta\ge\frac{T-R}{R-P}=\frac{2\eta_1}{-\eta_1-\eta_2+2\beta} \]

Now suppose we do the same for the condition for TFT to also be a SPE:

\[\delta=\frac{T-R}{R-S}=\frac{P-S}{T-P} \] \[\Rightarrow\frac{2\eta_1}{-\eta_1+\eta_2+2\beta}=\frac{2\eta_2}{\eta_1-\eta_2+2\beta} \]

Note that the denominator of the left term is larger than the critical value for NE, thus SPE requires a Î´ which is too low for NE. So TFT cannot be both NE and SPE. A little rearranging or the SPE condition will yield:

\[\eta_1(\eta_1+2\beta)=\eta_2(\eta_2+2\beta) \]

This condition implies that, for TFT to be a SPE, it in necessary that $\eta_1=\eta_2$. This is not a sufficient condition.

8.3 Limited Punishment

The NE solution is:

\[\delta(1-\delta^k )/(1-\delta)\ge\frac{T-R}{R-P}=\frac{2\eta_1}{-\eta_1-\eta_2+2\beta} \]

8.4 Win-Stay Lose-Switch

This is the same as TFT and LP, but satisfies both NE and SPE:

\[\delta(1-\delta^k )/(1-\delta)\ge\frac{T-R}{R-P}=\frac{2\eta_1}{-\eta_1-\eta_2+2\beta} \]

8.5 Graphing the critical values as the decomposition changes

Code

# cSpell:disable

import numpy as np
import math

eta_1 = 9
eta_2 = 6.5
beta_min = (3 * eta_1 - eta_2) / 2
beta_min_round = math.ceil(beta_min)
beta_max = 30
kappa = 30
K = list(range(1, 21))

def solve(k, beta):
    z = k - 1
    X = 2 * eta_1 / (-eta_1 - eta_2 + 2 * beta)
    zeros = [0] * z
    delta_TFT = max(X, 0)
    delta_GRIM = max(2 * eta_1 / (eta_1 - eta_2 + 2 * beta), 0)
    error = 0.00000001
    if z == 0:
        delta_crit = delta_TFT
    else:
        coeff = [1, *zeros, -X-1, X]
        roots = np.roots(coeff)
        y = np.iscomplex(roots)
        roots_minus_complex = []
        delta_crit = []
        for j in range(len(coeff)-1):
            if not y[j]:
                roots_minus_complex.append(roots[j].real)
        for j in range(len(roots_minus_complex)):
            if (roots_minus_complex[j] >= delta_GRIM-error) and (roots_minus_complex[j] <= delta_TFT+error):
                delta_crit = roots_minus_complex[j]
    return delta_crit

deltas = []
for j in range(beta_min_round, beta_max+1):
  delta=[]
  for i in range(len(K)):
    delta.append({'k': i+1, 'delta': solve(K[i], j)})
  deltas.append({'beta': j, 'data': delta})
  delta=[]
        
ojs_define(lp_crit_delta_beta_data = deltas)
# cSpell:enable

Code

viewof lp_beta = slider({
  min:11, 
  max:30,
  step:1,
  value:11,
  title:'β'
})
lp_crit_delta_beta_index=Math.round(lp_beta-11)

Code

Plot.plot({
  x: {
    label: "k",
    labelAnchor: 'center',
    line: true,
    domain: [0, lp_crit_delta_beta_data[lp_crit_delta_beta_index]['data'].length]
  },
  y: {
    label: "δ",
    labelAnchor: 'center',
    line: true,
    grid: true,
    domain: [0, Math.max(lp_crit_delta_beta_data[lp_crit_delta_beta_index]['data'][0]['delta'],1)]
  },
  marks: [
    Plot.line(lp_crit_delta_beta_data[lp_crit_delta_beta_index]['data'], 
    {x: 'k',y: 'delta', stroke: 'darkred'}
    ),
    Plot.ruleY([1])
  ]
})

Code

Inputs.table(lp_crit_delta_beta_data[lp_crit_delta_beta_index]['data'], {
  columns: [
    "k",
    "delta"
  ],
  header: {
    x: "k",
    delta: "δ"
  }
})

Figure 5: Limited Punishment Critical NE Discount Rates for k Punishment Periods

Figure 5

9 Competing Strategies

In the previous sections I have shown the NE and SPE conditions for each of the above strategies and shown how the decomposition relates to these. All of these conditions were on the basis of the assumption that one’s opponent would play the same strategy as you. This is clearly not always the case and many previous experiments (including the one mentions in this paper) have shown that players choose different strategies and change strategies over time. On this note, it is important to look at how these strategies fair against one another, and then see how the decomposition affects this. This will allow us to infer how the decomposition of PDs affects the relative proportions at which these strategies tend to be played. The popular method to assess how strategies perform against one another is through evolutionary stability.

It is a general point that all ESS will be a NE also, but not vice versa, but a strategy will be an ESS if it is a strict best response to itself. Therefore, all of the NE conditions need to hold with strict inequality for the strategy to be considered an ESS.

To analyse the dynamics, I will assess how each strategy performs against one-another using replicator dynamics. This uses the fitness of a strategy (the payoffs) to state whether the strategy will become more or less prominent. If goo (high fitness) it will replicate and be played more, and if bad (low fitness) it will shrink and not be played. Strategies which cannot be invaded are ESS.

Let there be an established strategy $\sigma\in\Sigma$. Suppose a proportion $\lambda>0$ of players switch to a mutant strategy $\hat{\sigma}\ne\sigma$, and the remaining $(1-\lambda)$ continue playing $\sigma$. The average strategy is:

\[\bar{\sigma}=\lambda\hat{\sigma}+(1-\lambda)\sigma \]

The fitness of strategy $\sigma$ is defined as the payoff against this average strategy, denoted $\pi(\sigma,\bar{\sigma})$. The growth of these strategies over time is defined as their relative performance to the average fitness. Thus, if the fitness of the mutant strategy $\hat{\sigma}$ against the average strategy is higher than the relevant fitness of the incumbent strategy $\sigma$, then the proportion of mutants $\lambda$ will grow: $\lambda_{t+1}>\lambda_t$.

Replicator dynamics allow us to study these changes. Let the current state of the system be $x_t(\sigma_i)$, which is the proportion of players playing strategy $i$ in time $t$. The differential equation for $x_t$ is given by:

\[\dot{x_t}(\sigma_i)=[\pi(\sigma_i, \bar{\sigma})-\pi(\bar{\sigma}, \bar{\sigma})]x_t(\sigma_i) \]

To solve for this we solve $\dot{x}=0$, but in the prisoners’ dilemma it is very unlikely that any strategy will be stable as most can encroach on one-another.

It should be noted that if the mutant strategy is dominated by the incumbent (AC and AD, for example) then the mutant will die out, and vice versa.

9.1 Tit-for-Tat Vs Always Defect

First, suppose the incumbent strategy is AD. The fitness of TFT and AD against the average strategy $\bar{\sigma}=x\circ\sigma_{TFT}+(1-x)\circ\sigma_D$ are below:

\[\Pi(\sigma_{TFT}, \bar{\sigma})=x\frac{R}{1-\delta}+(1-x)\frac{S+\delta P}{1-\delta^2} \] \[\Pi(\sigma_{D}, \bar{\sigma})=x\frac{T+\delta P}{1-\delta^2}+(1-x)\frac{P}{1-\delta} \]

Using these two equations we can find the fitness of the average strategy against itself:

\[\Pi(\bar{\sigma},\bar{\sigma})= x\Pi(\sigma_{TFT}, \bar{\sigma})+ (1-x)\Pi(\sigma_D, \bar{\sigma}) \]

\[\Pi(\bar{\sigma},\bar{\sigma})= x^2\frac{R}{1-\delta}+x(1-x)\frac{S+\delta P}{1-\delta^2}+ x(1-x)\frac{T+\delta P}{1-\delta^2}+(1-x)^2\frac{P}{1-\delta} \]

The replicator dynamics are given by:

\[\dot{x}=x[\Pi(\sigma_{TFT}, \bar{\sigma})-\Pi(\bar{\sigma}, \bar{\sigma})] \]

Substituting in yields the differential equation:

\[\dot{x}=x\bigg[x(1-x)\frac{R}{1-\delta}+ (1-x)^2\frac{S+\delta P}{1-\delta^2}- x(1-x)\frac{T+\delta P}{1-\delta^2}- (1-x)^2\frac{P}{1-\delta}\bigg] \]

\[\dot{x}=x(1-x)\bigg(\frac{1}{1-\delta^2}\bigg) [x((R-T)+\delta(R-P))+(1-x)(S-P)] \]

\[\dot{x}=\bigg(\frac{1}{1-\delta^2}\bigg)x(1-x) [x((P-S)-(T-R)+\delta(R-P))-(P-S)] \]

This has 3 roots: $\bar{x}=0$, $\bar{x}=1$ and $\bar{x}=\frac{P-S}{(P-S)-(T-R)+\delta(R-P)}$.

This can be simplified somewhat, and the decomposition can be substituted in:

\[\dot{x}=\bigg(\frac{1}{1-\delta^2}\bigg)x(1-x) [x(2\eta_2-2\eta_1+\delta(-\eta_1-\eta_2+2\beta))-2\eta_2] \]

The inter-interval root can then be expressed as $\bar{x}=\frac{2\eta_2}{2\eta_2-2\eta_1+\delta(-\eta_1-\eta_2+2\beta)}$.

Code

# cSpell:disable
import numpy as np

T=50
R=40
P=25
S=12
X=list(np.linspace(0, 1, 101))
Delta_range=list(np.linspace(0.1, 0.9, 9))

def solve(x, delta):
  x_dot=x*(1-x)*(x*(P-S-T+R+delta*(R-P))-P+S)/(1-delta**2)
  return x_dot

x_dots = []
for j in range(len(Delta_range)):
  x_dot=[]
  for i in range(len(X)):
    x_dot.append({'x': X[i], 'x_dot': solve(X[i], float(Delta_range[j]))})
  x_dots.append({'delta': Delta_range[j], 'data': x_dot})

ojs_define(rep_dyn_tft_ad_data = x_dots)
# cSpell:enable

Code

viewof tft_ad_delta = slider({
  min: 0.1,
  max: 0.9,
  step: 0.1, 
  value: 0.1,
  title: 'δ'
})
rep_dyn_tft_ad_index=Math.round(10*tft_ad_delta-1)

Code

Plot.plot({
  x: {
    label: "x",
    labelAnchor: "center",
    line: true,
    domain: [0, 1],
  },
  y: {
    label: "ẋ",
    labelAnchor: "center",
    line: true,
    grid: true,
  },
  marks: [
    Plot.line(rep_dyn_tft_ad_data[rep_dyn_tft_ad_index]['data'],
    {x: 'x', y: 'x_dot', stroke: 'darkred'}),
    Plot.ruleY([0])
  ]
})

Code

Inputs.table(rep_dyn_tft_ad_data[rep_dyn_tft_ad_index]['data'], {
  columns: [
    "x",
    "x_dot"
  ],
  header: {
    x: "x",
    x_dot: "ẋ"
  }
})

Figure 6: Replicator Dynamics for TFT against AD

From Figure 6 it can be seen that below a certain discount factor TFT cannot infiltrate AD, which is analogous to the NE. Note that the differential equation is equal to zero at $x=0$ and $x=1$. The other root of the function is:

\[\bar{x}=\frac{P-S}{(P-S)-(T-R)+\delta(R-P)} \]

\[\bar{x}=\frac{2\eta_2}{2\eta_2-2\eta_1+\delta(-\eta_1-\eta_2+2\beta)} \]

The NE condition can be recovered from this. For the TFT NE to exist the root must be less than or equal to 1, which leads to the same NE condition found before:

\[\delta\ge\frac{T-R}{R-P} \]

Even then, TFT requires the proportion of TFT players to be larger than this root in order for it to become fully dominant (x=1). This is also an unstable solution, as anything less will lead to TFT dying out, and anything more will lead to TFT being the entire population.

This goes to show that even if a strategy is a NE, it still may be unlikely to be prominent as it cannot invade a population always defecting.

As an effort to find how to maximise TFT one could try to minimise the root of the differential equation. One can immediately see that the root is strictly falling in $R$ and strictly increasing in $T$. This suggests that minimising the strategic element $\eta_1=\frac{T-R}{2}$ will decrease the root. This is also proven when the decomposition has been substituted in.

The differential wrt P is more complex, but the numerator can be written as:

\[f(\delta)=\delta(R-S)-(T-R) \]

This shows that the root of the above differential equation will be increasing in P if $\delta>\frac{T-R}{R-S}$. As $S<P$ this value fro $\delta$ is smaller than the critical value for TFT NE, thus this condition will be satisfied. Therefore, one can conclude:

\[\frac{\partial\bar{x}}{\partial P}>0 \]

As for S, it also is less clear as it decreases both the numerator and the denominator. The numerator of the differential wrt to S is:

\[f(\delta)=[(T-R)-\delta(R-P)] \]

Therefore, the root is strictly falling in S if $\delta>\frac{T-R}{R-P}$, thus we want to make the strategic element $\eta_2=\frac{P-S}{2}$ to be as small as possible.

One can find the derivative wrt $\beta$, but it is evidently clear that $\bar{x}$ is falling in the behavioural component. Interestingly, this derivative is quite dependant on $\delta$:

\[\frac{\partial\bar{x}}{\partial\beta}=\frac{-4\delta\eta_2}{(2\eta_2-2\eta_1+\delta(-\eta_1-\eta_2+2\beta))^2} \]

This shows that, keeping the game strategically the same, the higher the discount factor, the less of an effect the behavioural component has on the critical discount factor for TFT to invade AD. More importantly, at low discount factors, possibly less than the critical $\delta$, the behavioural component will have a higher effect at reducing the critical value of $\delta$, thus making it a powerful tool.

Figure 7 below will show the replicator dynamics for different values of the decomposition.

Code

# cSpell:disable
import numpy as np

eta_1=list(np.linspace(1, 5, 5))
eta_2=list(np.linspace(1, 5, 5))
beta=list(np.linspace(5, 15, 11))

X=list(np.linspace(0, 1, 101))
Delta_range=list(np.linspace(0.1, 0.9, 9))

def solve_x(x, delta, eta_1, eta_2, beta):
  x_dot=x*(1-x)*(x*(2*eta_2-2*eta_1+delta*(-eta_1-eta_2+2*beta))-2*eta_2)/(1-delta**2)
  return x_dot

def solve_delta(delta, eta_1, eta_2, beta):
  if beta>(eta_1+eta_2):
    if (2*eta_2-2*eta_1+delta*(-eta_1-eta_2+2*beta))>0:
      delta_crit=2*eta_2/(2*eta_2-2*eta_1+delta*(-eta_1-eta_2+2*beta))
    else:
      delta_crit=''
  else:
    delta_crit=''
  return delta_crit

delta_crits = []
for i in range(len(Delta_range)):
  for j in range(len(eta_1)):
    for k in range(len(eta_2)):
      for l in range(len(beta)):
        delta_crits.append(solve_delta(float(Delta_range[i]), float(eta_1[j]), float(eta_2[k]), float(beta[l])))

x_dots = []

for i in range(len(Delta_range)):
  for j in range(len(eta_1)):
    for k in range(len(eta_2)):
      for l in range(len(beta)):
        x_dot=[]
        if (2*float(eta_2[k])-2*float(eta_1[j])+float(Delta_range[i])*(-float(eta_1[j])-float(eta_2[k])+2*float(beta[l]))) != 0:
          x_bar = 2*float(eta_2[k])/(2*float(eta_2[k])-2*float(eta_1[j])+float(Delta_range[i])*(-float(eta_1[j])-float(eta_2[k])+2*float(beta[l])))
        else:
          x_bar = ''
        for m in range(len(X)):
          x_dot.append({'x': X[m], 'x_dot': solve_x(X[m], float(Delta_range[i]), float(eta_1[j]), float(eta_2[k]), float(beta[l]))})
        x_dots.append({'delta': Delta_range[i], 'eta_1': eta_1[j], 'eta_2': eta_2[k], 'beta': beta[l], 'data': x_dot, 'x_bar': x_bar})

ojs_define(rep_dyn_tft_ad_decomp_data = x_dots)
# cSpell:enable

Code

viewof tft_ad_decomp_eta_1 = slider({
  min: 1,
  max: 5,
  step: 1,
  value: 1,
  title: 'η₁'
})
viewof tft_ad_decomp_eta_2 = slider({
  min: 1,
  max: 5,
  step: 1,
  value: 1,
  title: 'η₂'
})
viewof tft_ad_decomp_beta = slider({
  min: 5,
  max: 15,
  step: 1,
  value: 5,
  title: 'β'
})
viewof tft_ad_decomp_delta = slider({
  min: 0.1,
  max: 0.9,
  step: 0.1,
  value: 0.1,
  title: 'δ'
})
rep_dyn_tft_ad_decomp_index = Math.round(
  2750*(tft_ad_decomp_delta-0.1)+55*(tft_ad_decomp_eta_1-1
  )+11*(tft_ad_decomp_eta_2-1)+tft_ad_decomp_beta-5
  )
rep_dyn_tft_ad_decomp_x_bar_var = rep_dyn_tft_ad_decomp_data[rep_dyn_tft_ad_decomp_index]['x_bar']
viewof tft_ad_decomp_x_bar = {
  const element = htl.html`<div>x̄=${rep_dyn_tft_ad_decomp_x_bar_var}</div>`
  return element
}

Code

Plot.plot({
  x: {
    label: "x",
    labelAnchor: 'center',
    line: true,
    domain: [0, 1]
  },
  y: {
    label: "ẋ",
    labelAnchor: 'center',
    line: true,
    grid: true,
  },
  marks: [
    Plot.line(rep_dyn_tft_ad_decomp_data[rep_dyn_tft_ad_decomp_index]['data'], 
    {x: 'x',y: 'x_dot', stroke: 'darkred'}
    ),
    Plot.ruleY([0])
  ]
})

Code

Inputs.table(rep_dyn_tft_ad_decomp_data[rep_dyn_tft_ad_decomp_index]['data'], {
  columns: [
    "x",
    "x_dot"
  ],
  header: {
    x: "x",
    x_dot: "ẋ"
  }
})

Figure 7: Replicator Dynamics for TFT against AD

9.2 Grim Trigger Vs Always Defect

Suppose that AD is the incumbent strategy and Grim trigger wishes to invade. The fitness of each strategy against the average strategy $\bar{\sigma}=x\circ\sigma_{Grim}+(1-x)\circ\sigma_{D}$ is:

\[\Pi(\sigma_{Grim},\bar{\sigma})=x\frac{R}{1-\delta}+(1-x)(S+\frac{\delta P}{1-\delta}) \]

\[\Pi(\sigma_{D}, \bar{\sigma})=x(T+\delta\frac{P}{1-\delta})+(1-x)(\frac{P}{1-\delta}) \]

Combining these to give the fitness of the average strategy against itself:

\[\Pi(\bar{\sigma}, \bar{\sigma})=x^2\frac{R}{1-\delta}+x(1-x)(S+\frac{\delta P}{1-\delta})+(1-x)x(T+\frac{\delta P}{1-\delta})+(1-x)^2\frac{P}{1-\delta} \]

The replicator dynamics are given by:

\[\dot{x}=x\bigg[x(1-x)\frac{R}{1-\delta}+(1-x)^2(S+\frac{\delta P}{1-\delta})-x(1-x)(T+\frac{\delta P}{1-\delta})-(1-x)^2\frac{P}{1-\delta}\bigg] \]

\[\dot{x}=x(1-x)[x((P-S)-(T-R)+\frac{\delta}{1-\delta}(R-P))-(P-S)] \]

This has 3 roots: $\bar{x}=0, \bar{x}=1$ and $\bar{x}=\frac{P-S}{(P-S)-(T-R)+\frac{\delta}{1-\delta}(R-P)}$

The inner solution in terms of the decomposition is:

\[\bar{x}=\frac{2\eta_2}{2\eta_2-2\eta_1+\frac{\delta}{1-\delta}(-\eta_1-\eta_2+2\beta)} \]

9.3 Limited Punishment Strategies Vs Always Defect

Suppose the number of periods of punishment is $k$, then against the average strategy of $\bar{\sigma}=x\circ\sigma_{LP}+(1-x)\circ\sigma_D$, the fitness of limited punishment is:

\[\Pi(\sigma_{LP}, \bar{\sigma})=x\frac{R}{1-\delta}+(1-x)\frac{S+P\Sigma_{s=1}^{k}\delta^s}{1-\delta^{k+1}} \]

The fitness of AD against the average is:

\[\Pi(\sigma_{D}, \bar{\sigma})=x\frac{T+P\Sigma_{s=1}^{k}\delta^s}{1-\delta^{k+1}}+(1-x)\frac{P}{1-\delta} \]

Therefore, the fitness of the average strategy against itself is:

\[\Pi(\bar{\sigma}, \bar{\sigma})=x^2\frac{R}{1-\delta}+x(1-x)\frac{S+P\Sigma_{s=1}^{k}\delta^s}{1-\delta^{k+1}}+x(1-x)\frac{T+P\Sigma_{s=1}^{k}\delta^s}{1-\delta^{k+1}}+(1-x)^2\frac{P}{1-\delta} \]

The replicator dynamics are therefore given by:

\[\dot{x}=x\bigg[x(1-x)\frac{R}{1-\delta}+(1-x)^2\frac{S+P\Sigma_{s=1}^{k}\delta^s}{1-\delta^{k+1}}-x(1-x)\frac{T+P\Sigma_{s=1}^{k}\delta^s}{1-\delta^{k+1}}-(1-x)^2\frac{P}{1-\delta}\bigg] \]

\[\dot{x}=\frac{1}{1-\delta^{k+1}}x(1-x)[x((P-S)-(T-R)+\frac{\delta(1-\delta^k)}{1-\delta}(R-P))-(P-S)] \]

The roots of this are $\bar{x}=0$, $\bar{x}=1$ and $\bar{x}=\frac{P-S}{(P-S)-(T-R)+\frac{\delta(1-\delta^k)}{1-\delta}(R-P)}$.

It can easily be seen that this is the general case for TFT $(k=1)$ and Grim Trigger $(k=\infty)$. And when the decomposition is used:

\[\bar{x}=\frac{2\eta_2}{2\eta_2-2\eta_1+\frac{\delta(1-\delta^k)}{1-\delta}(-\eta_1-\eta_2+2\beta)} \]

the effects of each component on this root depends on the discount factor, but the behavioural component is the only part where the discount factor affects it as a scalar.

For figure see Replicator Dynamics for Limited Punishment

9.4 Win-Stay Lose-Switch Vs Always Defect

Next, consider the WSLS strategy against the incumbent AD. The fitness of each strategy against the average strategy $\bar{\sigma}=x\circ\sigma_{WSLS}+(1-x)\circ\sigma_D$ is the same as TFT as they follow the same on and off equilibrium paths.

9.5 Win-Stay Lose-Switch Vs Tit-for-Tat

Suppose that TFT is the incumbent strategy and WSLS wishes to invade. The fitness of each strategy against the average strategy $\bar{\sigma}=x\circ\sigma_{WSLS}+(1-x)\circ\sigma_{TFT}$ is the same, because both strategies prioritise playing C, and without any D choices, with always play C. Thus, they perform as well as each other and will not invade each other. Where this is not true is if there are a small fraction of D choices, perhaps mistakes or opportunistic choices intended to ‘test the waters’. Let Tit-For-Tat be the incumbent, and assume that every period a fraction $\epsilon$ of all players will make a mistake for a period, can WSLS take advantage of the defectors and invade? Let the strategies with error be $\sigma_{TFTE}$ and $\sigma_{WSLSE}$, then the average strategy in the population is:

\[\bar{\sigma}=x[(1-\epsilon)\circ\sigma_{WSLS}+\epsilon\circ\sigma_{WSLSE}]+(1-x)[(1-\epsilon)\circ\sigma_{TFT}+\epsilon\circ\sigma_{TFTE}] \]

The fitness of WSLS against the new average strategy is found by:

\[\Pi(\sigma_{WSLS}, \bar{\sigma})=x\frac{R}{1-\delta}+(1-x)[(1-\epsilon)(R+\delta Q_{C,C})+\epsilon(S+\delta Q_{C,D})] \]

Here, $Q_{C,C}$ is the value of the future payoffs after a C, C outcome, when paired with a TFTE player, and likewise $Q_{C,D}$ is the value of the future payoffs after a C, D outcome, when paired with a TFTE player. We can find expressions for these terms by expanding them and looking for patterns. Take $Q_{C,C}$, the on strategy choice is to continue to play C, thus the next outcome just depends on whether the TFTE player makes a mistake. If the TFTE player plays C, the outcome will be C, C, they both get R now, and the future payoffs will be $Q_{C,C}$. If the TFTE player make a mistake and plays D, the outcome will be C,D, and the payoff will be S plus the discounted value of the payoffs after a C,D outcome $(Q_{C,D})$:

\[Q_{C,C}=(1-\epsilon)(R+\delta Q_{C,C})+\epsilon(S+\delta Q_{C,D}) \]

In order to evaluate this further, we need to delve deeper into the term $Q_{C,D}$. WSLS states that after a C,D outcome one should switch to D. If one’s TFTE opponent follows their strategy they will revert back to C, leading to a D,C outcome. This will give the highest payoff T for one period, but will lead to punishment from the TFTE opponent. The following outcome again depends on mistakes: D,D if they make no mistake, leading to a payoff of P; or D,C if they mistakenly still play C, leading to a high payoff of T again. The total future payoff after the former outcome can be denoted as $Q_{D,D}$, and for the later outcome it is again $Q_{C,D}$.

But, if they make a mistake they will play D still. This leads to a D,D outcome with a payoff of P. Both players, if following their strategies, will switch to C, for a payoff of R. If the TFTE player make a mistake we will get C,D again, giving a payoff of S. This can all be brought together below.

\[Q_{C,D}=(1-\epsilon)[T+\delta(1-\epsilon)(P+\delta Q_{D,D})+\delta\epsilon(T+\delta Q_{D,C})]+\epsilon[P+\delta(1-\epsilon)(R+\delta Q_{C,C})+\delta\epsilon(S+\delta Q_{C,D})] \]

We also know that:

\[Q_{D,C}=(1-\epsilon)(P+\delta Q_{D,D})+\epsilon(T+\delta Q_{D,C}) \]

Therefore, the penultimate step is to find an expression for $Q_{D,D}$. Again, this has been states before:

\[Q_{D,D}=(1-\epsilon)(R+\delta Q_{C,C})+\epsilon(S+\delta Q_{C,D}) \]

These 4 equations will provide us with solutions to substitute into the fitness equation. Immediately, the first 3 equations can be simplified as terms exist on both sides of the equation:

\[Q_{C,C}=\frac{1-\epsilon}{1-\delta(1-\epsilon)}R+\frac{\epsilon}{1-\delta(1-\epsilon)}(S+\delta Q_{C,D}) \]

\[Q_{C,D}=\frac{1-\epsilon}{1-\delta^2\epsilon^2}[T+\delta(1-\epsilon)(P+\delta Q_{D,D})+\delta\epsilon(T+\delta Q_{D,C})]+\frac{\epsilon}{1-\delta^2\epsilon^2}[P+\delta(1-\epsilon)(R+\delta Q_{C,C})+\delta\epsilon S] \]

\[Q_{D,C}=\frac{1-\epsilon}{1-\delta\epsilon}(P+\delta Q_{D,D})+\frac{\epsilon}{1-\delta\epsilon}T \]

Substituting $Q_{C,C}$ reduces us to three equations:

\[Q_{D,C}=\frac{1-\epsilon}{1-\delta\epsilon}(P+\delta Q_{D,D})+\frac{\epsilon}{1-\delta\epsilon}T \]

\[Q_{D,D}=(1-\epsilon)(R+\delta\frac{1-\epsilon}{1-\delta\epsilon}(P+\delta Q_{D,D})+\delta\frac{\epsilon}{1-\delta\epsilon}T)+\epsilon(S+\delta Q_{C,D}) \]

Simplify $Q_{D,D}$ such that $Q_{D,D}$ is only on the left of the equation:

\[Q_{D,D}=\frac{(1-\epsilon)(1-\delta\epsilon)}{1-\delta\epsilon+\delta^2(1-\epsilon)^2}(R+\delta\frac{1-\epsilon}{1-\delta\epsilon}P+\delta\frac{\epsilon}{1-\delta\epsilon}T)+\frac{\epsilon(1-\delta\epsilon)}{1-\delta\epsilon+\delta^2(1-\epsilon)^2}(S+\delta Q_{C,D}) \]

Now substitute in $Q_{D,C}$ to leave us with two equations:

\[Q_{C,D}=\frac{1-\epsilon}{1-\delta^2\epsilon^2}[T+\delta(1-\epsilon)(P+\delta Q_{D,D})+\delta\epsilon(T+\delta\frac{1-\epsilon}{1-\delta\epsilon}(P+\delta Q_{D,D})+\delta\frac{\epsilon}{1-\delta\epsilon}T)]+\frac{\epsilon}{1-\delta^2\epsilon^2}[P+\delta(1-\epsilon)(R+\delta\frac{1-\epsilon}{1-\delta(1-\epsilon)}R+\delta\frac{\epsilon}{1-\delta(1-\epsilon)}(S+\delta Q_{C,D}))+\delta\epsilon S] \]

One can then simplify $Q_{C,D}$ and substitute in for $Q_{D,D}$:

Substituting in for $Q_{D,D}$:

\[Q_{C,D}=\frac{(1-\epsilon)(1-\delta(1-\epsilon))}{(1-\delta^2\epsilon^2-\delta(1-\epsilon))}[T+\delta(1-\epsilon)P+\frac{\delta^2(1-\epsilon)}{1-\delta\epsilon}\frac{(1-\epsilon)(1-\delta\epsilon)}{1-\delta\epsilon+\delta^2(1-\epsilon)^2}(R+\delta\frac{1-\epsilon}{1-\delta\epsilon}P+\delta\frac{\epsilon}{1-\delta\epsilon}T)+\frac{\delta^2(1-\epsilon)}{1-\delta\epsilon}\frac{\epsilon(1-\delta\epsilon)}{1-\delta\epsilon+\delta^2(1-\epsilon)^2}(S+\delta Q_{C,D})+\delta\epsilon(T+\delta\frac{1-\epsilon}{1-\delta\epsilon}P+\delta\frac{\epsilon}{1-\delta\epsilon}T)]+\frac{\epsilon(1-\delta(1-\epsilon))}{(1-\delta^2\epsilon^2-\delta(1-\epsilon))}[P+\delta(1-\epsilon)(R+\delta\frac{1-\epsilon}{1-\delta(1-\epsilon)}R+\delta\frac{\epsilon}{1-\delta(1-\epsilon)}S)+\delta\epsilon S] \]

\[Q_{C,D}=\frac{(1-\delta(1-\epsilon))[(1-\epsilon)(1-\delta\epsilon+2\delta^2(1-\epsilon)^2)T+((1-\delta\epsilon+\delta^2(1-\epsilon)^2)(\delta(1-\epsilon)+\epsilon(1-\delta))+\delta^3(1-\epsilon)^4)P]}{(1-\delta\epsilon)[(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))-\delta^3\epsilon(1-\epsilon)^2(1-\delta(1-\epsilon))]}+\frac{\delta^2(1-\epsilon)^2+\delta\epsilon(1-\epsilon)(1-\delta\epsilon)-\delta^3(1-\epsilon)^4}{(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))-\delta^3\epsilon(1-\epsilon)^2(1-\delta(1-\epsilon))}R+\frac{\delta^3\epsilon(1-\epsilon)^2(1-2\epsilon)-\delta^2\epsilon(2\epsilon-1)+\delta\epsilon^2}{(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))-\delta^3\epsilon(1-\epsilon)^2(1-\delta(1-\epsilon))}S \]

Substituting in to get $Q_{C,C}$:

\[Q_{C,C}=\frac{1-\epsilon}{1-\delta(1-\epsilon)}R+\frac{\epsilon}{1-\delta(1-\epsilon)}S+\frac{\delta\epsilon}{1-\delta(1-\epsilon)}Q_{C,D} \]

\[Q_{C,C}=\frac{\delta\epsilon}{1-\delta\epsilon}\frac{(1-\epsilon)(1-\delta\epsilon+2\delta^2(1-\epsilon)^2)T+((1-\delta\epsilon+\delta^2(1-\epsilon)^2)(\delta(1-\epsilon)+\epsilon(1-\delta))+\delta^3(1-\epsilon)^4)P}{(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))-\delta^3\epsilon(1-\epsilon)^2(1-\delta(1-\epsilon))}+\frac{\delta(1_\epsilon)(1-\delta(1-\epsilon))(\delta(1-\epsilon)+\epsilon(1-\delta\epsilon)-\delta^2(1-\epsilon)^2)+(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\epsilon-\delta^2\epsilon^2(1-\epsilon)+\delta(1-\epsilon)^2)}{(1-\delta(1-\epsilon))[(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))-\delta^3\epsilon(1-\epsilon)^2(1-\delta(1-\epsilon))]}R+\frac{\epsilon(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))+\delta\epsilon(1-\delta(1-\epsilon))(\delta^2(1-\epsilon)^2(1-3\epsilon)-\delta(2\epsilon-1)+\epsilon)}{(1-\delta(1-\epsilon))[(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))-\delta^3\epsilon(1-\epsilon)^2(1-\delta(1-\epsilon))]}S \]

References

Dal Bó, Guillaume R., Pedro Fréchette. 2019. “Strategy Choice in the Infinitely Repeated Prisoner’s Dilemma.” American Economic Review 109 (11): 3929–52. https://doi.org/10.1257/aer.20181480.

Jessie, D., and D. G. Saari. 2016. “From the Luce Choice Axiom to the Quantal Response Equilibrium.” Journal of Mathematical Psychology 75: 3–9. https://doi.org/10.1016/j.jmp.2015.10.001.

--- title: "Infinitely Repeated Prisoners' Dilemma and Decomposition" author: "Alex Ralphs" toc: true # Table of Contents # lof: true # list of figures # lot: true # list of tables number-sections: true bibliography: references.bib link-citations: true crossref: eq-prefix: "" highlight-style: pygments execute: echo: true format: html: code-tools: true code-fold: true hmtl-math-method: mathjax smooth-scroll: true page-layout: full self-contained: true # pdf: # geometry: # - top=30mm # - left=20mm # docx: default jupyter: python3 editor_options: chunk_output_type: inline --- <head>  <link rel="stylesheet" href="repeated-pd-styles.css">  ```{=html} <style> #content { display: block; } .internal { display: block; } </style> ``` </head> <body> ## Repeated Games Let's start with a simple 2x2 game commonly used for repeated games: a prisoner's dilemma (PD). The PD is used for repeated games as the defect action is no longer dominant when the game is repeated. The payoff matrix is as below @fig-pd-payoff-matrix, where T\>R\>P\>S. ```{python} #| label: fig-pd-payoff-matrix #| fig-cap: "Prisoners' Dilemma Payoff Matrix" from IPython.display import display, HTML payoff_matrix = open("payoff_matrix.html").read().format(Action_1='Co-operate', Action_2='Defect', T = 'T', R = 'R', S = 'S', P = 'P') display(HTML(payoff_matrix)) ``` One could argue that it is not necessary for R\>P, for this game to be considered a PD, but this condition implies that mutual co-operation is better than mutual defection, a generally desired quality. I imagine this is determined by the behavioural component also. Apparently, the 'iterated' (repeated) PD requires 2R\>T+S such that mutual co-operation is still of greater payoff that alternating co-operation and defection. The repeated PD allows for many more possible strategies that are also NE. This differs from the one-shot game where defect is dominant. Such strategies are Tit-for-tat (with or without forgiveness), Pavlov (win-stay, lose-switch), and Grim Trigger. The best strategy depends on the strategic breakdown of the population. In certain situations, a particular strategy may be the best strategy but could also be far worse in another situation. ## Decomposition Using the decomposition of [@Jessie_Saari_2016], the 3 decomposed parts are below: ```{python} from IPython.display import display, HTML payoff_matrix = open("2x2_decomposition_general.html").read().format(Action_1='Co-operate', Action_2='Defect', T = 'T', R = 'R', S = 'S', P = 'P', eta_1 = r'\frac{T-R}{2}', eta_2 = r'\frac{P-S}{2}', beta = r'\frac{T+R-S-P}{4}', kappa = r'\frac{T+R+S+P}{4}') display(HTML(payoff_matrix)) ``` The decomposition of the PD can be represented as 4 numbers: The strategic elements for both outcomes where the row players defects (@eq-strategic-element-1)(@eq-strategic-element-2), the behavioural element for either outcome where the row player co-operates (@eq-behavioural-element), and any of the kernel elements (@eq-kernel-element). This is a general product of the decomposition in 2x2 symmetrical games, and all these numbers will be positive. For the strategic elements, the 'defect' actions elements were chosen as they are positive (whereas the 'co-operation' actions are negative). Likewise, 'co-operation' is chosen for the behavioural element, as the PD defines this as being positive. This positive behavioural component is what makes a co-operative strategy what it is (analogous to a Hawk-Dove game). The PD is therefore described as: $$\eta_1=\frac{T-R}{2}>0 $$ {#eq-strategic-element-1} $$\eta_2=\frac{P-S}{2}>0 $$ {#eq-strategic-element-2} $$\beta=\frac{T+R-S-P}{4}>0 $$ {#eq-behavioural-element} $$\kappa=\frac{T+R+S+P}{4} $$ {#eq-kernel-element} The prior condition that 2R\>T+S, which ensures that mutual co-operation is better than alternating co-operation and defection, can be written in terms of the decomposition. Each payoff is the sum of a strategic, behavioural and a kernel element: $$R=-\eta_1+\beta+\kappa $$ $$T=\eta_1+\beta+\kappa $$ $$S=-\eta_2-\beta+\kappa $$ Therefore: $$2R=2(-\eta_1+\beta+\kappa)>\eta_1-\eta_2+2\kappa=T+S $$ $$\Rightarrow2\beta>3\eta_1-\eta_2 $$ When the game is repeated N number of times, the payoff is simply the sum of the payoffs in all of the realised outcomes. ## Infinitely Repeated Games Results from [@Dal_Bó_&_Fréchette_2019] suggest that 3 strategies can explain the majority of chosen strategies in infinitely repeated PDs: Always defect (AD); Tit-for-Tat (TFT); and Grim Trigger (Grim). They also find that as the probability of continuation increases, subjects tend to choose shorter punishments (prefer TFT over Grim trigger). Also, a low payoff to mutual co-operation is associated with Suspicious TFT, which starts with defect instead of co-operate. Their experimental design did not actually infinitely repeat the PD (as this would be impossible), but instead randomly terminated the repeating process. This is common, as participants do not know the end point, thus should treat every round the same, just as in an infinitely repeated game. They had 2 phases: in phase 1 the participants played the game as usual; in phase 2 the participants were asked to specify a strategy. Phase 2 is where they elicited strategies from them, and the method to which then did this varied between the sessions. In one session the would be asked to make an initial decision (C or D), and then a second decision conditional on the 4 possible outcomes of the first game (e.g. if (C, C) then C, if (C, D) then D). These are 'memory-one' strategies, as they only look backwards one period. In other session, they could choose more strategies (including 'memory-one'): - Always Co-op or always Defect - Co-op of X rounds, then Defect for the rest - Randomise C and D with X% - Grim Trigger - TFT or STFT - If both co-ordinate, then Co-operate, o/w Defect (WSLS) (Pavlov) - Co-op, but if anyone Defects in first X rounds, then defect (Grim-X) - TFXT - only punish if Defect for X rounds - Co-op, but if anyone Defects in first X rounds, then switch to defect for X rounds, repeat (T-X) - Built your own (Memory One) The stage game used is shown below. Interestingly they only change the value of R, keeping T, S and P constant. The possible values for R are 32 or 48. They also have the probability of continuation of 0.5, 0.75 and 0.9. A higher value will lead to longer expected supergames. They also use a point system where 100 points is equal to \$0.45, thus it may be beneficial to convert the payoffs into the monetary equivalents. ```{python} from IPython.display import display, HTML payoff_matrix = open("payoff_matrix.html").read().format(Action_1='Co-operate', Action_2='Defect', T = 50, R = 'R', S = 12, P = 25) display(HTML(payoff_matrix)) ``` A simple table showing the decomposition of these two games is below. ```{=html} <table class="pd-decomp-table" style="width:80% !important"> <tr class="table_header"> <th class="table_header">R</th> <th class="table_header">$$\eta_1$$</th> <th class="table_header">$$\eta_2$$</th> <th class="table_header">$$\beta$$</th> <th class="table_header">$$\kappa$$</th> </tr> <tr class="table_row"> <th class="table_contents">32</th> <th class="table_contents">9</th> <th class="table_contents">6.5</th> <th class="table_contents">11.25</th> <th class="table_contents">29.75</th> </tr> <tr class="table_row"> <th class="table_contents">48</th> <th class="table_contents">1</th> <th class="table_contents">6.5</th> <th class="table_contents">15.25</th> <th class="table_contents">33.75</th> </tr> </table> ``` The full decomposition of both games is below also. ```{python} from IPython.display import display, HTML T=50 R=32 S=12 P=25 payoff_matrix = open("2x2_decomposition.html").read().format(Action_1='Co-operate', Action_2='Defect', T = T, R = R, S = S, P = P, eta_1 = (T-R)/2, eta_2 = (P-S)/2, beta = (T+R-S-P)/4, kappa = (T+R+S+P)/4) display(HTML(payoff_matrix)) ``` ```{python} from IPython.display import display, HTML T=50 R=48 S=12 P=25 payoff_matrix = open("2x2_decomposition.html").read().format(Action_1='Co-operate', Action_2='Defect', T = T, R = R, S = S, P = P, eta_1 = (T-R)/2, eta_2 = (P-S)/2, beta = (T+R-S-P)/4, kappa = (T+R+S+P)/4) display(HTML(payoff_matrix)) ``` One can see that the increase in R both decreases the strategic element when the opponent plays 'Co-op' and increases the behavioural component of playing 'Co-op' oneself. This is as expected really, and both would explain some of the data from the authors experiments. The general results from their paper are below @fig-dal_bo_frechette_table_3. ![[@Dal_Bó_&_Fréchette_2019]: Table 3](Dal%20Bo%20and%20Frechette%20Table%203.png){#fig-dal_bo_frechette_table_3} ```{=html} <script> const slider = document.getElementById('fig-dal_bo_frechette_table_3'); let isDown = false; let startX; let scrollLeft; slider.addEventListener('mousedown', (e) => { isDown = true; slider.classList.add('active'); startX = e.pageX - slider.offsetLeft; scrollLeft = slider.scrollLeft; }); slider.addEventListener('mouseleave', () => { isDown = false; slider.classList.remove('active'); }); slider.addEventListener('mouseup', () => { isDown = false; slider.classList.remove('active'); }); slider.addEventListener('mousemove', (e) => { if(!isDown) return; e.preventDefault(); const x = e.pageX - slider.offsetLeft; const walk = (x - startX); //scroll-fast slider.scrollLeft = scrollLeft - walk; console.log(walk); }); </script> ``` The results show some interesting trends. It seems that a higher value of R will lead to more TFT and Grim Trigger strategies (and more AC), and fewer AD and STFT (and others). Similarly, the higher the probability of continuation the higher the prevalence of TFT and Grim Trigger. The decomposition would suggest that more C would be played when R is larger, and this is partly reflected by an increase in the prevalence of AC strategies. ## Nash Equilibria ### Always Defect It is a Nash equilibrium to defect in every round. This is state in a folk theory, that any NE in the stage game, with be a NE or the extended, and finitely repeated, game. This also make sense through backward induction, as both players would be best off by defecting in the final game, but knowing this will also defect in the game prior, and so on. In no game will it be beneficial to co-operate, thus mutual defection is the NE. For co-operation to be supported under NE, the players need to be unaware of the number of rounds $N$. However, experiments have shown that people do co-operate even when the number of rounds is known, and some of the more common strategies are below. ### Grim Trigger This is a well-known strategy which aims to infinitely punish bad behaviour. The strategy states that one should co-operate until one's opponent defects, in which case defect forever after. This can be seen as a less forgiving, and generally under performing, version of tit-for-tat, as a tit-for-tat strategy will only punish for as long as one's opponent attempt to defect. To find the NE we must compare the payoff of the Grim trigger strategy and a deviation in one round. Suppose that in period $t$ both players have played C and no-one has defected. We can assume this as (see other authors, I ain't proving this). The total expected payoff, given the probability of continuation of $\delta$, for player $i$ of action $a$ in period $t$, is given by: $$\Pi(a)=\sum_{t=0}^{\infty} \delta^t \pi_i (a_i^t, a_{-i}^t) $$ Given that both players mutually co-operated last period $(t-1)$ the expected payoffs of co-operating and defecting are: $$ \Pi(C)=R(1+\delta+\delta^2+\cdots)=\frac{R}{1-\delta} $$ $$ \Pi(D)=T+\delta P(1+\delta+\delta^2+\cdots)=T+\frac{\delta P}{1-\delta} $$ Here it is of course assumed that both players are playing a grim trigger strategy, thus if they play C they will play this forever if it is preferred now as there is an infinite horizon (i.e. decision now is identical to the decision tomorrow if they play C, hence choice will be the same). It isn't necessary to assume this, as it is implied anyway, but we can also state that the strategy starts with playing C. Also, if they defect, then their opponent defects next period, thus there is mutual defection forever after the first defection. For the grim trigger to be NE it is necessary (and sufficient) that $\Pi(C)\ge\Pi(D)$. This simplifies to: $$\frac{R}{1-\delta}\ge T+\frac{\delta P}{1-\delta} $$ $$\delta\ge\frac{T-R}{T-P} $$ One can then represent this in terms of the decomposition to understand how the strategic and behavioural components effect this NE: $$\delta\ge\bar{\delta}_{Grim}=\frac{2\eta_1}{\eta_1-\eta_2+2\beta} $$ This critical value of $\delta_{Grim}$ is both strategic and behavioural, increasing in both strategic element and decreasing in the behavioural component. This implies that as the strategic elements increase (making defect relatively higher paying) the range of values for $\delta$ which are NE is smaller, thereby one would expect fewer Grim trigger strategies to be used. On the other hand, the larger the behavioural component, the smaller $\delta_{Grim}$, which increases the range of possible values of $\delta$ which are a NE. One would then expect more Grim trigger strategies as the behavioural component increases. Intuitively, one could argue that this is because the behavioural component increases R, which is true, but this could be countered by the same increase in T, whereas the behavioural component also reduces P, making the grim trigger a worse punishment, and a more effective deterrent of defection. To see this, take the inequality for the critical value below: $$R+\frac{\delta R}{1-\delta}\ge T+\frac{\delta P}{1-\delta} $$ The first terms are the payoff in period t of co-operation (R) and defection (T) respectively. The second terms are the expected payoffs that follow from these decisions: R forever if co-operate; and P forever if deviate. Let's rearrange this such that these effects are one the same side as each other: $$T-R\le \frac{\delta}{1-\delta}(R-P) $$ $$2\eta_1\le\frac{\delta}{1-\delta}(-\eta_1-\eta_2+2\beta) $$ The left-hand side is the initial gain from deviation, and the right-hand side is the expected future gain from co-operation. For NE the right must outweigh the left. If the behavioural component increases, both R and T increase by the same nominal amount, therefore leaving the left-hand side unchanged, they cancel each other out. Thus, behavioural changes are only influencing the future gains of co-operation. The strategic influence is quite different as is affects both sides: with $\eta_1$ increasing the left and decreasing the right; whist $\eta_2$ only decreases the right. Thus, even though both strategic elements have the same effect on $\bar{\delta}_{Grim}$, $\eta_1$ is more powerful (the effect is larger). ### Tit-for-Tat (TFT) Generally considered the most robust strategy, the tit-for-tat strategy states that one will co-operate in the first round, and then in each subsequent round, do whatever one's opponent did in the previous round. Tit-for-Tat is a special can of a limited punishment strategy where the player only punished for 1 period. One can extend the punishment to finite levels or even infinite, which would be a grim trigger. To find the critical values for the TFT NE one used the same steps used to find the grim trigger NE. Again, the prior choices will have been mutual co-operation, as stated in the TFT strategy. Given this history, the payoff of continuing to co-operate is still R, and will forever be R in NE. $$\Pi(C)=R(1+\delta+\delta^2+\cdots)=\frac{R}{1-\delta}$$ This is the same on-NE path expected payoff as the grim trigger strategy. The payoff for defecting will be T in the first period, P in the next and then R forever, as the punishment has finished. $$\Pi(D)=T+\delta P+\delta^2R(1+\delta+\delta^2+\cdots)=T+\delta P+\frac{\delta^2R}{1-\delta} $$ Therefore, for TFT to be a NE this following inequality must be satisfied: $$\frac{R}{1-\delta}\ge T+\delta P+\frac{\delta^2R}{1-\delta} $$ $$R+\delta R+\frac{\delta^2R}{1-\delta}\ge T+\delta P+\frac{\delta^2R}{1-\delta} $$ $$\delta\ge\delta_{TFT}=\frac{T-R}{R-P} $$ This was simplified, but one can find the result via the quadratic also. One solution will not be within the rang eon zero to 1, and is disregarded. $$R\geq T-\delta T+\delta\left(1-\delta\right)P+\delta^2R $$ $$\delta^2\left(R-P\right)-\delta\left(T-P\right)+T-R\le0 $$ $$\delta=\frac{T-P\pm\sqrt{\left(T-P\right)^2-4\left(R-P\right)\left(T-R\right)}}{2\left(R-P\right)} $$ $$\sqrt{\left(T-P\right)^2-4\left(R-P\right)\left(T-R\right)}=\sqrt{T^2-2TP+P^2+4R^2-4RT+4TP-4RP} $$ $$=\sqrt{T^2+2TP+P^2+4R^2-4RT-4RP}=\sqrt{\left(T-2R+P\right)^2} $$ $$\delta=\frac{T-P\pm\left(T-2R+P\right)}{2\left(R-P\right)} $$ Again, I can insert the decomposition into the critical value: $$\delta_{TFT}=\frac{2\eta_1}{-\eta_1-\eta_2+2\beta} $$ The implication here is similar to grim trigger: both strategic elements increase the critical value, but $\eta_1$ by more so than $\eta_2$; the behavioural component decreases the critical value. The only difference between the critical value of the TFT and that of grim trigger is in the denominator (R for a T), which means that $\delta_{TFT}>\delta_{Grim}$. ### General Limited Punishment Strategies This is the general case for grim trigger and Tit-for-Tat, where those two strategies are the extreme versions of a strategy which punishes defection for some positive number of periods. As with TFT and grim trigger the expected payoff of limited punishment strategies when on the NE path will be mutual co-operation in all periods: $$\Pi\left(C\right)=R\left(1+\delta+\delta^2+\ldots\right)=\frac{R}{1-\delta} $$ The payoff for defection will be T in the first round, P for the number of rounds of punishment, and R thereafter. One round of punishment is TFT, infinite rounds of punishment is grim trigger. For 2 rounds of punishment the expected payoff is $$\Pi\left(D\right)=T+\delta P+\delta^2P+\delta^3R\left(1+\delta+\delta^2+\ldots\right)=T+\delta\left(1+\delta\right)P+\frac{\delta^3R}{1-\delta} $$ For 3 rounds of punishment the expected payoff is $$\Pi\left(D\right)=T+\delta P+\delta^2P+\delta^3P+\delta^3R\left(1+\delta+\delta^2+\ldots\right)=T+\delta\left(1+\delta+\delta^2\right)P+\frac{\delta^4R}{1-\delta} $$ These are just some examples, but this can be generalised. Suppose the number of rounds of punishment is k, then the expected payoff of defection is $$\Pi\left(D\right)=T+P\sum_{s=1}^{k}\delta^s+\delta^{k+1}R\left(1+\delta+\delta^2+\ldots\right)=T+P\sum_{s=1}^{k}\delta^s+\frac{\delta^{k+1}R}{1-\delta} $$ For the limited punishment to be a NE, then playing C always must be better in expectation that deviating, thus: $$\frac{R}{1-\delta}\geq T+P\sum_{s=1}^{k}\delta^s+\frac{\delta^{k+1}R}{1-\delta} $$ $$R\geq T-\delta T+\left(1-\delta\right)P\sum_{s=1}^{k}\delta^s+\delta^{k+1}R $$ $$\delta T-\delta^{k+1}R-\left(1-\delta\right)P\sum_{s=1}^{k}\delta^s\geq T-R $$ When equated to find the critical value, this above equation does not (to my knowledge) have an analytical solution. It can be simplified though to make it easier to find a solution. As the payoffs of both actions are expected to be the same after the punishment, these will cancel out, thus we only need to consider payoffs in the first round and every round of punishment. We can expand the payoff of co-operating slightly: $$\frac{R}{1-\delta}=R+R\sum_{s=1}^{k}\delta^s+\frac{\delta^{k+1}R}{1-\delta} $$ When substituting into the inequality above, the last term cancels out, thus we have the NE condition of: $$R+R\sum_{s=1}^{k}\delta^s\geq T+P\sum_{s=1}^{k}\delta^s $$ $$\left(R-P\right)\sum_{s=1}^{k}\delta^s\geq T-R $$ Therefore, we have the condition: $$\sum_{s=1}^{k}\delta^s\geq\frac{T-R}{R-P} $$ It is clear how we recover the TFT condition from this with $k=1$, and then if we let $k\rightarrow\infty$, a little rearranging will give the grim trigger condition also. As the left-hand side is strictly increasing in $k$, then critical values for all limited punishment strategies fall between that of TFT and grim trigger, monotonically increasing in k from TFT to grim trigger. This can be simplified further as this is just a geometric series $\sum_{s} a_s$ where ${a_{s+k}}/{a_s}=\delta^k$. The general simplification for this is: $$\sum_{s=1}^{k}\delta^s=\frac{\delta\left(1-\delta^k\right)}{1-\delta} $$ If $k=1$ (TFT) this simplifies to $\delta$, and as $k\rightarrow\infty$ (Grim trigger) this will tend to $\frac{\delta}{1-\delta}$. The condition for NE can now be shown as: $$\frac{\delta\left(1-\delta^k\right)}{1-\delta}\geq\frac{T-R}{R-P} $$ The solutions of this for $k=1$ and $k\rightarrow\infty$ are simple as terms cancel or tend to zero, but solving for any other integer is analytically difficult, and tends to lead to routes. For example, take $k=2$, the condition is: $$\frac{\delta\left(1-\delta^2\right)}{1-\delta}=\delta\left(1+\delta\right)\geq\frac{T-R}{R-P} $$ $$\Rightarrow\delta^2+\delta-\frac{T-R}{R-P}\geq0 $$ This can be solved: $$\delta\geq\frac{-1+\sqrt{1+4\frac{T-R}{R-P}}}{2} $$ Whilst this is not too complex and can easily be found, it does involve a square route, and higher powers will include higher powered routes to solve. If we let $\frac{T-R}{R-P}=X$, then one can rearrange: $$\frac{\delta\left(1-\delta^k\right)}{1-\delta}\geq X $$ $$\delta^{k+1}-\left(X+1\right)\delta+X\le0 $$ To my knowledge, it is not possible to solve this as a function of $k$ (i.e. one cannot solve without first knowing the value of $k$). One can find the solutions using formulas up to $k=3$, but beyond this it is impossible, therefore numeric methods are required. ### Win-Stay Lose-Switch (Pavlov) This strategy states that the player should stick with what's working and switch if it doesn't. Outcomes are considered either a 'win' or a 'loss': if one co-operates, then mutual co-operation is a win and perseveres, but being defected against is a loss, and will be met with defection in kind (changing action). Defecting against a player who co-operates is also a win, but if they punish you by defecting also, this is a loss, and will be responded to by co-operating again. Therefore, it is easy to find the NE as it will be supported by the same values of $\delta$ as TFT. The reason for this is that the on-strategy path is to always co-operate, and the deviation from this will be a defection, then one round of punishment, and then back to co-operation, the same deviating path as TFT. $$\delta_{WSLS}=\delta_{TFT}=\frac{T-R}{R-P} $$ ## Subgame Perfect Equilibrium I have shown under what conditions the previous strategies can be supported as a NE, but this does not ensure they are Subgame Perfect. To be a SPE the strategy must (necessary and sufficient) satisfy the one-deviation property, which states that no player can increase their payoff by changing action for 1 period given any history. Given this definition, not all of the above strategies are SPE. The strategies that are both NE and SPE are AD (for any $\delta$), Grim trigger (for $\delta\ge\delta_{Grim}$), WSLS (for $\delta\ge\delta_{WSLS}$). ### Grim Trigger The reason Grim Trigger is SPE is as follows. The outcome path will either be (C, C) in every period or (D, D) in every period. For (C, C), the on-path payoff is $\frac{R}{1-\delta}$, and deviating for 1 period gets the player T for one period and P thereafter equalling $T+\frac{\delta P}{1-\delta}$. Therefore, there is no incentive to deviate if $\delta\geq\delta_{Grim}=\frac{T-R}{T-P}$. When the outcome is (D, D) there is no incentive to change to C as it's worse. Thus, Grim trigger can be a SPE. ### TFT For TFT, there are 4 possible histories (C, C), (C, D), (D, C) or (D, D). Assume player 2 is playing TFT. After (C, C), sticking to TFT gives $\frac{R}{1-\delta}$, while deviating to D for one period will get T, but then player 1 goes back to C at the same time as player 2 reacts and defects, so player 1 gets S. They then get in a loop of (D, C) then (C, D) and so on, thus the payoff is: $$T+\delta S+\delta^2T+\delta^3S+\cdots=\frac{T+\delta S}{1-\delta^2} $$ For SPE we therefore need: $$\frac{R}{1-\delta}\ge\frac{T+\delta S}{1-\delta^2} $$ $$\delta\ge\frac{T-R}{R-S} $$ After (C, D), sticking to the TFT will see (D, C) then (C, D) and so on, for a payoff of $\frac{T+\delta S}{1-\delta^2}$. If they deviate for only one period (playing C), the outcome will be (C, C) which means that TFT leads to (C, C) forever after. Thus, the payoff of deviation is $\frac{R}{1-\delta}$. Therefore, we need: $$\delta\le\frac{T-R}{R-S} $$ After (D, C) sticking to TFT again sees the loop but getting S before T, so the payoff is $\frac{\delta T+S}{1-\delta^2}$. Deviating for one period will mean playing D still, and player 2 will have switched to D, so the outcome is (D, D). TFT states that they will both punish each other by playing D, so they play D forever giving the payoff of deviating of $\frac{P}{1-\delta}$. For SPE we need: $$\frac{\delta T+S}{1-\delta^2}\geq\frac{P}{1-\delta} $$ $$\delta\ge\frac{P-S}{T-P} $$ Finally, after (D, D), the TFT path is D forever giving the payoff of $\frac{P}{1-\delta}$. Deviating for one period will give S for a period, but then they get into the (D, C) to (C, D) loop again giving a payoff of $\frac{\delta T+S}{1-\delta^2}$. For SPE we need: $$\frac{P}{1-\delta}\geq\frac{\delta T+S}{1-\delta^2} $$ $$\delta\le\frac{P-S}{T-P} $$ For TFT to be a SPE, all 4 of these conditions must be satisfied, which implies a $\delta$ which satisfies the equation below: $$\delta_{TFT-SPE}=\frac{T-R}{R-S}=\frac{P-S}{T-P} $$ ### Limited Punishment The solutions to this generally are difficult to find analytically, as the functional form of depends on the number of periods of punishment. The NE condition is below, and highlights this issue $$\sum_{s=1}^{k}\delta^s\geq\frac{T-R}{R-P} $$ If $k=1$ (TFT) then the function is linear, if $k=2$ the function is quadratic, and so on. The NE requires one to solve a polynomial of order $k$, which is easy when k is small, but increasing difficult, and may lead to multiple solutions as k increases. SPE has a similar issue. I have already found the condition for which TFT is a SPE, which covers $k=1$. Therefore, let $k=2$, and consider the same 4 histories plus a couple more that depend on how many rounds of punishment have passed. After (C, C) the on-path outcomes will be C forever giving a payoff of $\frac{R}{1-\delta}$. If one deviates for a period this will gain T, but they will be punished to 2 periods after. In the first period of punishment, they get S, but they will the reciprocate the punishment, hence the outcome in the 2<sup>nd</sup> punishment period is (D, D), getting P. This resets the punishment from the opponent also, thus they will continually punish each other forever. Thus, the payoff of the deviation is $T+\delta S+\frac{\delta^2P}{1-\delta}$. For the SPE we then need: $$\frac{R}{1-\delta}\geq T+\delta S+\frac{\delta^2P}{1-\delta} $$ $$\delta\left(R-S\right)-\delta^2\left(P-S\right)\geq\left(1-\delta\right)\left(T-R\right) $$ $$\left(P-S\right)\delta^2-\left(T-S\right)\delta+\left(T-R\right)\le0 $$ This quadratic can be solved via the quadratic equation but has no simplified solution past this point. If the opponent is playing D, this means that they are punishing the player, and this can be either the first round or second round. Thus, there are 2 histories here where the opponent plays D: if this is the first round then the history of the previous 2 rounds is either {(D, C), (C, D)}, {(D, C), (D, D)} or {(D, D), (D, D)}; if this is the second round of punishment then the history is {(C, D), (D, D)}. After {(D, C), (C, D)}, staying on strategy means the next outcome will be (D, D), and they will punish each other forever. The payoff is $\frac{P}{1-\delta}$. Deviation for one period means the next outcome will be (C, D) and payoff is S. Going back to the strategy means the next outcome will be (D, C) for a payoff of T. The next period both players will be punishing each other, but lagged a period, thus we get (D, D) forever. Thus, the payoff of deviation is $S+\delta T+\frac{\delta^2P}{1-\delta}$. Therefore, for SPE we need: $$\frac{P}{1-\delta}\geq S+\delta T+\frac{\delta^2P}{1-\delta} $$ $$\left(T-P\right)\delta^2-\left(T-S\right)\delta+P-S\geq0 $$ This can be solved and simplified: $$\delta=\frac{T-S\pm\sqrt{\left(T-S\right)^2-4\left(T-P\right)\left(P-S\right)}}{2\left(T-P\right)} $$ $$\delta=\frac{T-S\pm\left(T+S-2P\right)}{2\left(T-P\right)} $$ Therefore, we have either $\delta\le\frac{P-S}{T-P}$ or $\delta\geq1$. The second condition is impossible as $\delta<1$, thus for SPE it is necessary that: $$\delta\le\frac{P-S}{T-P} $$ After {(D, C), (D, D)} the strategy states that the next outcome will be (D, D) again, thus they get P forever. Deviating for a period will lead to (C, D) for a payoff of S. The outcome after this will be (D, C), then (D, D) after that. This is the same as the previous case, so the condition is the same. The same can be said after history {(D, D), (D, D)}, as the strategy will lead to (D, D) forever and deviating will lead to (C, D), (D, C), then (D, D) forever also. After {(C, D), (D, D)} the next on-strategy outcome will be (D, D) again as the opponent will start a new 2-round punishment. The deviation outcome will be (C, D) for a payoff of S, and the outcome after will be (D, D) forever after. This is strictly worse for any $\delta$ as $S<P$. The last history to check is the only other outcome where the opponent plays C. The previous (C, C) history requires both players to have always played C, but the outcome (D, C) does not. It does however require the player to have played C previously. The opponent could have played either C or D. Therefore, we need to check both {(C, D), (D, C)} and {(C, C), (D, C)}. The former has been explored somewhat already. After {(C, D), (D, C)} the next on-strategy outcome will be (D, D), as the player punishes for a second period and the opponent reciprocally punishes. This leads to P forever. Deviating for one period will lead to (C, D), then (D, D) forever which is strictly worse. After {(C, C), (D, C)} the next on-strategy outcome will be (D, D) as the player is punished. This will be reciprocated, and the outcomes will be (D, D) forever. If they deviate the outcome will be (C, D), then (D, D) forever. The deviation is strictly worse again. (This I was unsure about). In summary, there are only 2 conditions that must be satisfied for $k=2$ to satisfy a SPE: $$\delta\le\frac{P-S}{T-P} $$ $$\left(P-S\right)\delta^2-\left(T-S\right)\delta+\left(T-R\right)\le0 $$ For the payoffs in the paper's games: $$\delta\le\frac{13}{25}=0.52 $$ $$13\delta^2-38\delta+\left(50-R\right)\le0 $$ $$\Rightarrow\frac{38-\sqrt{1444-52\ast\left(50-R\right)}}{26}\le\delta\le\frac{38+\sqrt{1444-52\ast\left(50-R\right)}}{26} $$ $$R=32\Rightarrow0.59\le\delta\le2.33 $$ $$R=48\Rightarrow0.054\le\delta\le2.87 $$ For $R=32$, there is never an SPE, but for $R=48$ the lower bound is very low, so the SPE is possible. The lower bound of $\delta$ is determined by: $$\delta\geq\frac{T-S-\sqrt{\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right)}}{2\left(P-S\right)} $$ It follows that it is necessary that: $$\frac{T-S-\sqrt{\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right)}}{2\left(P-S\right)}\le\frac{P-S}{T-P} $$ $$\frac{\left(T-S\right)\left(T-P\right)-2\left(P-S\right)^2}{\left(T-P\right)}\le\frac{\sqrt{\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right)}}{1} $$ $$(T-S)-\frac{2(P-S)^2}{(T-P)}\le\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right) $$ $$(T-S)^2-\frac{4(P-S)^4}{(T-P)^2}-4\frac{(T-S)(P-S)^2}{(T-P)}\le\left(T-S\right)^2-4\left(P-S\right)\left(T-R\right) $$ $$\frac{(P-S)^4}{(T-P)^2}+\frac{(T-S)(P-S)^2}{(T-P)}-(P-S)(T-R)\ge0 $$ This is pretty intractable at this point, which leads me to believe that higher values of k will be especially intractable and difficult to decipher any SPE from. ### WSLS Win-Stay Lose-Switch has the same possible histories as TFT, but differing actions thereafter. From (C, C), the on-strategy action to play C forever with a payoff of $\frac{R}{1-\delta}$. If one deviates for one period (Plays D), they will gain T, which is a win, thus they stay with D. The other player lost, so switches to D, so the outcome is (D, D) with a payoff of $\delta P$. They both consider this a loss and both switch to C forever with a payoff of $\frac{\delta^2R}{1-\delta}$. Thus, the payoff of defection for a period is $T+\delta P+\frac{\delta^2R}{1-\delta}$. Therefore, for SPE we need: $$\frac{R}{1-\delta}\geq T+\delta P+\frac{\delta^2R}{1-\delta} $$ $$\delta\geq\frac{T-R}{R-P} $$ After (C, D), WSLS states that one would switch and the opponent stays, so the outcome is (D, D) and payoff is P. Then both switch to C forever and get $\frac{\delta R}{1-\delta}$. If one deviates then they don't initially switch, the outcome is still (C, D) and they get S. The payoff of deviating for a period is then $S+\delta P+\frac{\delta^2R}{1-\delta}$. Therefore, for SPE we need: $$P+\frac{\delta R}{1-\delta}\geq S+\delta P+\frac{\delta^2R}{1-\delta} $$ $$\delta\geq-\frac{P-S}{R-P} $$ This is always satisfied as $P>S$. After (D, C), staying on WSLS means staying with D, but one's opponent will switch, thus the outcome will be (D, D). Then they both switch to C forever, so the payoff will be $P+\frac{\delta R}{1-\delta}$. Deviating for a period means switching also, so the outcome is (C, D), the following steps were explained prior, so the payoff will be $S+\delta P+\frac{\delta^2R}{1-\delta}$. The SPE condition is the same as before, so will be satisfied also. Finally, after (D, D) if both follow the strategy they will play (C, C) forever, giving the payoff of $\frac{R}{1-\delta}$. If one deviates for a period, the outcome will be (D, C) giving T. In the following period the opponent will switch to D, they payoff will be $\delta P$ and the payoff after they switch to C will be $\frac{\delta^2R}{1-\delta}$. This is the same SPE condition as (C, C), so will be satisfied if that is. Thus, for WSLS to be a SPE we only need $\delta\geq\frac{T-R}{R-P}$. This is the same condition as the NE condition, therefore is WSLS is a NE, it is also a SPE. ## Summary of NE and SPE ::: {#fig-summary-ne-spe} ```{=html} <table style="width:100%"> <tr class="table_header"> <th class="table_header">Strategy</th> <th class="table_header">NE</th> <th class="table_header">SPE</th> </tr> <tr> <th class="table_contents">AD</th> <th class="table_contents">Yes</th> <th class="table_contents">Yes</th> </tr> <tr> <th class="table_contents">AC</th> <th class="table_contents">No</th> <th class="table_contents">No</th> </tr> <tr> <th class="table_contents">Grim Trigger</th> <th class="table_contents">$$\delta\ge\frac{T-R}{T-P} $$</th> <th class="table_contents">$$\delta\ge\frac{T-R}{T-P} $$</th> </tr> <tr> <th class="table_contents">Tit-f0r-Tat</th> <th class="table_contents">$$\delta\ge\frac{T-R}{R-P}$$</th> <th class="table_contents">$$\delta=\frac{T-R}{R-S}=\frac{P-S}{T-P}$$</th> </tr> <tr> <th class="table_contents">Limited Punishment</th> <th class="table_contents">$$\frac{\delta(1-\delta^k)}{1-\delta}\ge\frac{T-R}{R-P}$$</th> <th class="table_contents">For k=2,$$\frac{T-S-\sqrt{(T-S)^2-4(T-P)(P-S)}}{2(P-S)}\le\delta$$ For k>2, I don't know.</th> </tr> <tr> <th class="table_contents">Win-Stay-Lose-Switch</th> <th class="table_contents">$$\delta\ge\frac{T-R}{R-P}$$</th> <th class="table_contents">$$\delta\ge\frac{T-R}{R-P}$$</th> </tr> </table> ``` Summary of NE and SPE ::: See @fig-summary-ne-spe. ## Graphing the Critical Values of $\delta$ Not all of these need graphing, and in fact the only interesting thing to graph here is the limited punishment. As it is the general case for TFT and grim trigger, the limited punishment NE solutions lie between these two, depending on the number of periods of punishment. Luckily the equation is a sum of a geometric series, therefore it has a closed form simplification. Thus: $$\frac{\delta(1-\delta^k)}{1-\delta}\ge\frac{T-R}{R-P} $$ The solutions for this can be numerically approximate for increasing values of $k$ (and some appropriately chosen payoffs) and plotted to show the path between TFT and Grim trigger. The above can be written as a polynomial of order $k+1$, where $X=(T-R)/(R-P)$: $$\delta^{k+1}-(X+1)\delta+X\le0 $$ What will help in graphing the critical values that satisfy a LP NE is that the geometric series is strictly increasing in both $\delta$ and $k$, thus as we increase the value of $k$ it must be that the critical value of $\delta$ will be smaller. We also know the extreme values of the critical $\delta$ are at the extreme values of $k$ ($k=1$ & $k=\infty$), thus we know that every critical value of $\delta$ must lie between these two points. Thus, one can state that in general the critical value ($\bar{\delta}$) must satisfy: $$\bar{\delta}\in\bigg[\frac{T-R}{T-P},\frac{T-R}{R-P}\bigg] $$ This means that when attempting to find the solutions to the $(k+1)$-order polynomials one can restrict ourselves to this above interval. As $\bar{\delta}$ is falling in $k$, we can restrict this interval further for the $k^{th}$ $\bar{\delta}$: $$\bar{\delta}_{k+1}\in\bigg[\frac{T-R}{T-P}, \bar{\delta}_k\bigg] $$ The graph below (@fig-crit-values-plot) shows the $\bar\delta$ for NE for an increasing number of punishment periods. It includes a slider for the value of R (restricted above P but less than T). It shows that at low $k$, limited punishment strategies cannot be a NE unless R is sufficiently large. For TFT to be a NE, R needs to be at least 38. ```{python} # cSpell:disable import numpy as np T = 50 P = 25 K = list(range(1, 21)) def solve(k, R): z = k - 1 X = (T-R) / (R-P) zeros = [0] * z delta_TFT = X delta_GRIM = (T-R) / (T-P) error = 0.00000001 if z == 0: delta_crit = delta_TFT else: coeff = [1, *zeros, -X-1, X] roots = np.roots(coeff) y = np.iscomplex(roots) roots_minus_complex = [] delta_crit = [] for j in range(len(coeff)-1): if not y[j]: roots_minus_complex.append(roots[j].real) for j in range(len(roots_minus_complex)): if (roots_minus_complex[j] >= delta_GRIM-error) and (roots_minus_complex[j] <= delta_TFT+error): delta_crit = roots_minus_complex[j] return delta_crit deltas = [] for j in range(1, int(T-P+1)): R = P+j delta=[] for i in range(len(K)): delta.append({'k': i+1, 'delta': solve(K[i], R)}) deltas.append({'R': R, 'data': delta}) delta=[] ojs_define(lp_crit_delta_data = deltas) # cSpell:enable ``` ::: {#fig-crit-values-plot} ::: {.panel-sidebar} ```{ojs} import {slider} from '@jashkenas/inputs' viewof R = slider({ min:26, max:49, step:1, value:26, title:'R' }) lp_crit_delta_index=Math.round(R-26) ``` ::: ::: {.panel-fill} ::: {.panel-tabset .ojs-track-layout} ## Plot ```{ojs} Plot.plot({ x: { label: "k", labelAnchor: 'center', line: true, domain: [0, lp_crit_delta_data[lp_crit_delta_index]['data'].length] }, y: { label: "δ", labelAnchor: 'center', line: true, grid: true, domain: [0, Math.max(lp_crit_delta_data[lp_crit_delta_index]['data'][0]['delta'],1)] }, marks: [ Plot.line(lp_crit_delta_data[lp_crit_delta_index]['data'], {x: 'k', y: 'delta', stroke: 'darkred'} ), Plot.ruleY([1]) ] }) ``` ## Data ```{ojs} Inputs.table(lp_crit_delta_data[lp_crit_delta_index]['data'], { columns: [ "k", "delta" ], header: { x: "k", delta: "δ" } }) ``` ::: ::: Limited Punishment Critical NE Discount Rates for k Punishment Periods ::: ## Using the Decomposition ### Grim Trigger The condition for Grim trigger to be both a NE and SPE can be expressed using the decomposition as so: $$\delta\ge\frac{T-R}{T-P}=\frac{2\eta_1}{\eta_1-\eta_2+2\beta} $$ ### Tit-for-Tat Suppose we substituted the decomposed elements in for the payoffs for the critical values for TFT to be a NE: $$\delta\ge\frac{T-R}{R-P}=\frac{2\eta_1}{-\eta_1-\eta_2+2\beta} $$ Now suppose we do the same for the condition for TFT to also be a SPE: $$\delta=\frac{T-R}{R-S}=\frac{P-S}{T-P} $$ $$\Rightarrow\frac{2\eta_1}{-\eta_1+\eta_2+2\beta}=\frac{2\eta_2}{\eta_1-\eta_2+2\beta} $$ Note that the denominator of the left term is larger than the critical value for NE, thus SPE requires a Î´ which is too low for NE. So TFT cannot be both NE and SPE. A little rearranging or the SPE condition will yield: $$\eta_1(\eta_1+2\beta)=\eta_2(\eta_2+2\beta) $$ This condition implies that, for TFT to be a SPE, it in necessary that $\eta_1=\eta_2$. This is not a sufficient condition. ### Limited Punishment The NE solution is: $$\delta(1-\delta^k )/(1-\delta)\ge\frac{T-R}{R-P}=\frac{2\eta_1}{-\eta_1-\eta_2+2\beta} $$ ### Win-Stay Lose-Switch This is the same as TFT and LP, but satisfies both NE and SPE: $$\delta(1-\delta^k )/(1-\delta)\ge\frac{T-R}{R-P}=\frac{2\eta_1}{-\eta_1-\eta_2+2\beta} $$ ### Graphing the critical values as the decomposition changes ```{python} # cSpell:disable import numpy as np import math eta_1 = 9 eta_2 = 6.5 beta_min = (3 * eta_1 - eta_2) / 2 beta_min_round = math.ceil(beta_min) beta_max = 30 kappa = 30 K = list(range(1, 21)) def solve(k, beta): z = k - 1 X = 2 * eta_1 / (-eta_1 - eta_2 + 2 * beta) zeros = [0] * z delta_TFT = max(X, 0) delta_GRIM = max(2 * eta_1 / (eta_1 - eta_2 + 2 * beta), 0) error = 0.00000001 if z == 0: delta_crit = delta_TFT else: coeff = [1, *zeros, -X-1, X] roots = np.roots(coeff) y = np.iscomplex(roots) roots_minus_complex = [] delta_crit = [] for j in range(len(coeff)-1): if not y[j]: roots_minus_complex.append(roots[j].real) for j in range(len(roots_minus_complex)): if (roots_minus_complex[j] >= delta_GRIM-error) and (roots_minus_complex[j] <= delta_TFT+error): delta_crit = roots_minus_complex[j] return delta_crit deltas = [] for j in range(beta_min_round, beta_max+1): delta=[] for i in range(len(K)): delta.append({'k': i+1, 'delta': solve(K[i], j)}) deltas.append({'beta': j, 'data': delta}) delta=[] ojs_define(lp_crit_delta_beta_data = deltas) # cSpell:enable ``` ::: {#fig-crit-values-plot_decomp} ::: {.panel-sidebar} ```{ojs} viewof lp_beta = slider({ min:11, max:30, step:1, value:11, title:'β' }) lp_crit_delta_beta_index=Math.round(lp_beta-11) ``` ::: ::: {.panel-fill} ::: {#lp_crit_delta_beta_data-tabset .panel-tabset .ojs-track-layout} ## Plot ```{ojs} Plot.plot({ x: { label: "k", labelAnchor: 'center', line: true, domain: [0, lp_crit_delta_beta_data[lp_crit_delta_beta_index]['data'].length] }, y: { label: "δ", labelAnchor: 'center', line: true, grid: true, domain: [0, Math.max(lp_crit_delta_beta_data[lp_crit_delta_beta_index]['data'][0]['delta'],1)] }, marks: [ Plot.line(lp_crit_delta_beta_data[lp_crit_delta_beta_index]['data'], {x: 'k',y: 'delta', stroke: 'darkred'} ), Plot.ruleY([1]) ] }) ``` ## Data ```{ojs} Inputs.table(lp_crit_delta_beta_data[lp_crit_delta_beta_index]['data'], { columns: [ "k", "delta" ], header: { x: "k", delta: "δ" } }) ``` ::: ::: Limited Punishment Critical NE Discount Rates for k Punishment Periods ::: @fig-crit-values-plot_decomp ## Competing Strategies In the previous sections I have shown the NE and SPE conditions for each of the above strategies and shown how the decomposition relates to these. All of these conditions were on the basis of the assumption that one's opponent would play the same strategy as you. This is clearly not always the case and many previous experiments (including the one mentions in this paper) have shown that players choose different strategies and change strategies over time. On this note, it is important to look at how these strategies fair against one another, and then see how the decomposition affects this. This will allow us to infer how the decomposition of PDs affects the relative proportions at which these strategies tend to be played. The popular method to assess how strategies perform against one another is through evolutionary stability. It is a general point that all ESS will be a NE also, but not vice versa, but a strategy will be an ESS if it is a strict best response to itself. Therefore, all of the NE conditions need to hold with strict inequality for the strategy to be considered an ESS. To analyse the dynamics, I will assess how each strategy performs against one-another using replicator dynamics. This uses the fitness of a strategy (the payoffs) to state whether the strategy will become more or less prominent. If goo (high fitness) it will replicate and be played more, and if bad (low fitness) it will shrink and not be played. Strategies which cannot be invaded are ESS. Let there be an established strategy $\sigma\in\Sigma$. Suppose a proportion $\lambda>0$ of players switch to a mutant strategy $\hat{\sigma}\ne\sigma$, and the remaining $(1-\lambda)$ continue playing $\sigma$. The average strategy is: $$\bar{\sigma}=\lambda\hat{\sigma}+(1-\lambda)\sigma $$ The fitness of strategy $\sigma$ is defined as the payoff against this average strategy, denoted $\pi(\sigma,\bar{\sigma})$. The growth of these strategies over time is defined as their relative performance to the average fitness. Thus, if the fitness of the mutant strategy $\hat{\sigma}$ against the average strategy is higher than the relevant fitness of the incumbent strategy $\sigma$, then the proportion of mutants $\lambda$ will grow: $\lambda_{t+1}>\lambda_t$. Replicator dynamics allow us to study these changes. Let the current state of the system be $x_t(\sigma_i)$, which is the proportion of players playing strategy $i$ in time $t$. The differential equation for $x_t$ is given by: $$\dot{x_t}(\sigma_i)=[\pi(\sigma_i, \bar{\sigma})-\pi(\bar{\sigma}, \bar{\sigma})]x_t(\sigma_i) $$ To solve for this we solve $\dot{x}=0$, but in the prisoners' dilemma it is very unlikely that any strategy will be stable as most can encroach on one-another. It should be noted that if the mutant strategy is dominated by the incumbent (AC and AD, for example) then the mutant will die out, and vice versa. ### Tit-for-Tat Vs Always Defect First, suppose the incumbent strategy is AD. The fitness of TFT and AD against the average strategy $\bar{\sigma}=x\circ\sigma_{TFT}+(1-x)\circ\sigma_D$ are below: $$\Pi(\sigma_{TFT}, \bar{\sigma})=x\frac{R}{1-\delta}+(1-x)\frac{S+\delta P}{1-\delta^2} $$ $$\Pi(\sigma_{D}, \bar{\sigma})=x\frac{T+\delta P}{1-\delta^2}+(1-x)\frac{P}{1-\delta} $$ Using these two equations we can find the fitness of the average strategy against itself: $$\Pi(\bar{\sigma},\bar{\sigma})= x\Pi(\sigma_{TFT}, \bar{\sigma})+ (1-x)\Pi(\sigma_D, \bar{\sigma}) $$ $$\Pi(\bar{\sigma},\bar{\sigma})= x^2\frac{R}{1-\delta}+x(1-x)\frac{S+\delta P}{1-\delta^2}+ x(1-x)\frac{T+\delta P}{1-\delta^2}+(1-x)^2\frac{P}{1-\delta} $$ The replicator dynamics are given by: $$\dot{x}=x[\Pi(\sigma_{TFT}, \bar{\sigma})-\Pi(\bar{\sigma}, \bar{\sigma})] $$ Substituting in yields the differential equation: $$\dot{x}=x\bigg[x(1-x)\frac{R}{1-\delta}+ (1-x)^2\frac{S+\delta P}{1-\delta^2}- x(1-x)\frac{T+\delta P}{1-\delta^2}- (1-x)^2\frac{P}{1-\delta}\bigg] $$ $$\dot{x}=x(1-x)\bigg(\frac{1}{1-\delta^2}\bigg) [x((R-T)+\delta(R-P))+(1-x)(S-P)] $$ $$\dot{x}=\bigg(\frac{1}{1-\delta^2}\bigg)x(1-x) [x((P-S)-(T-R)+\delta(R-P))-(P-S)] $$ This has 3 roots: $\bar{x}=0$, $\bar{x}=1$ and $\bar{x}=\frac{P-S}{(P-S)-(T-R)+\delta(R-P)}$. This can be simplified somewhat, and the decomposition can be substituted in: $$\dot{x}=\bigg(\frac{1}{1-\delta^2}\bigg)x(1-x) [x(2\eta_2-2\eta_1+\delta(-\eta_1-\eta_2+2\beta))-2\eta_2] $$ The inter-interval root can then be expressed as $\bar{x}=\frac{2\eta_2}{2\eta_2-2\eta_1+\delta(-\eta_1-\eta_2+2\beta)}$. ```{python} # cSpell:disable import numpy as np T=50 R=40 P=25 S=12 X=list(np.linspace(0, 1, 101)) Delta_range=list(np.linspace(0.1, 0.9, 9)) def solve(x, delta): x_dot=x*(1-x)*(x*(P-S-T+R+delta*(R-P))-P+S)/(1-delta**2) return x_dot x_dots = [] for j in range(len(Delta_range)): x_dot=[] for i in range(len(X)): x_dot.append({'x': X[i], 'x_dot': solve(X[i], float(Delta_range[j]))}) x_dots.append({'delta': Delta_range[j], 'data': x_dot}) ojs_define(rep_dyn_tft_ad_data = x_dots) # cSpell:enable ``` ::: {#fig-replicator-dynamics-tft-ad} ::: {.panel-sidebar} ```{ojs} viewof tft_ad_delta = slider({ min: 0.1, max: 0.9, step: 0.1, value: 0.1, title: 'δ' }) rep_dyn_tft_ad_index=Math.round(10*tft_ad_delta-1) ``` ::: ::: {.panel-fill} ::: {.panel-tabset .ojs-track-layout} ## Plot ```{ojs} Plot.plot({ x: { label: "x", labelAnchor: "center", line: true, domain: [0, 1], }, y: { label: "ẋ", labelAnchor: "center", line: true, grid: true, }, marks: [ Plot.line(rep_dyn_tft_ad_data[rep_dyn_tft_ad_index]['data'], {x: 'x', y: 'x_dot', stroke: 'darkred'}), Plot.ruleY([0]) ] }) ``` ## Data ```{ojs} Inputs.table(rep_dyn_tft_ad_data[rep_dyn_tft_ad_index]['data'], { columns: [ "x", "x_dot" ], header: { x: "x", x_dot: "ẋ" } }) ``` ::: ::: Replicator Dynamics for TFT against AD ::: From @fig-replicator-dynamics-tft-ad it can be seen that below a certain discount factor TFT cannot infiltrate AD, which is analogous to the NE. Note that the differential equation is equal to zero at $x=0$ and $x=1$. The other root of the function is: $$\bar{x}=\frac{P-S}{(P-S)-(T-R)+\delta(R-P)} $$ $$\bar{x}=\frac{2\eta_2}{2\eta_2-2\eta_1+\delta(-\eta_1-\eta_2+2\beta)} $$ The NE condition can be recovered from this. For the TFT NE to exist the root must be less than or equal to 1, which leads to the same NE condition found before: $$\delta\ge\frac{T-R}{R-P} $$ Even then, TFT requires the proportion of TFT players to be larger than this root in order for it to become fully dominant (x=1). This is also an unstable solution, as anything less will lead to TFT dying out, and anything more will lead to TFT being the entire population. This goes to show that even if a strategy is a NE, it still may be unlikely to be prominent as it cannot invade a population always defecting. As an effort to find how to maximise TFT one could try to minimise the root of the differential equation. One can immediately see that the root is strictly falling in $R$ and strictly increasing in $T$. This suggests that minimising the strategic element $\eta_1=\frac{T-R}{2}$ will decrease the root. This is also proven when the decomposition has been substituted in. The differential wrt P is more complex, but the numerator can be written as: $$f(\delta)=\delta(R-S)-(T-R) $$ This shows that the root of the above differential equation will be increasing in P if $\delta>\frac{T-R}{R-S}$. As $S<P$ this value fro $\delta$ is smaller than the critical value for TFT NE, thus this condition will be satisfied. Therefore, one can conclude: $$\frac{\partial\bar{x}}{\partial P}>0 $$ As for S, it also is less clear as it decreases both the numerator and the denominator. The numerator of the differential wrt to S is: $$f(\delta)=[(T-R)-\delta(R-P)] $$ Therefore, the root is strictly falling in S if $\delta>\frac{T-R}{R-P}$, thus we want to make the strategic element $\eta_2=\frac{P-S}{2}$ to be as small as possible. One can find the derivative wrt $\beta$, but it is evidently clear that $\bar{x}$ is falling in the behavioural component. Interestingly, this derivative is quite dependant on $\delta$: $$\frac{\partial\bar{x}}{\partial\beta}=\frac{-4\delta\eta_2}{(2\eta_2-2\eta_1+\delta(-\eta_1-\eta_2+2\beta))^2} $$ This shows that, keeping the game strategically the same, the higher the discount factor, the less of an effect the behavioural component has on the critical discount factor for TFT to invade AD. More importantly, at low discount factors, possibly less than the critical $\delta$, the behavioural component will have a higher effect at reducing the critical value of $\delta$, thus making it a powerful tool. @fig-replicator-dynamics-tft-ad-full-custom below will show the replicator dynamics for different values of the decomposition. ```{python} # cSpell:disable import numpy as np eta_1=list(np.linspace(1, 5, 5)) eta_2=list(np.linspace(1, 5, 5)) beta=list(np.linspace(5, 15, 11)) X=list(np.linspace(0, 1, 101)) Delta_range=list(np.linspace(0.1, 0.9, 9)) def solve_x(x, delta, eta_1, eta_2, beta): x_dot=x*(1-x)*(x*(2*eta_2-2*eta_1+delta*(-eta_1-eta_2+2*beta))-2*eta_2)/(1-delta**2) return x_dot def solve_delta(delta, eta_1, eta_2, beta): if beta>(eta_1+eta_2): if (2*eta_2-2*eta_1+delta*(-eta_1-eta_2+2*beta))>0: delta_crit=2*eta_2/(2*eta_2-2*eta_1+delta*(-eta_1-eta_2+2*beta)) else: delta_crit='' else: delta_crit='' return delta_crit delta_crits = [] for i in range(len(Delta_range)): for j in range(len(eta_1)): for k in range(len(eta_2)): for l in range(len(beta)): delta_crits.append(solve_delta(float(Delta_range[i]), float(eta_1[j]), float(eta_2[k]), float(beta[l]))) x_dots = [] for i in range(len(Delta_range)): for j in range(len(eta_1)): for k in range(len(eta_2)): for l in range(len(beta)): x_dot=[] if (2*float(eta_2[k])-2*float(eta_1[j])+float(Delta_range[i])*(-float(eta_1[j])-float(eta_2[k])+2*float(beta[l]))) != 0: x_bar = 2*float(eta_2[k])/(2*float(eta_2[k])-2*float(eta_1[j])+float(Delta_range[i])*(-float(eta_1[j])-float(eta_2[k])+2*float(beta[l]))) else: x_bar = '' for m in range(len(X)): x_dot.append({'x': X[m], 'x_dot': solve_x(X[m], float(Delta_range[i]), float(eta_1[j]), float(eta_2[k]), float(beta[l]))}) x_dots.append({'delta': Delta_range[i], 'eta_1': eta_1[j], 'eta_2': eta_2[k], 'beta': beta[l], 'data': x_dot, 'x_bar': x_bar}) ojs_define(rep_dyn_tft_ad_decomp_data = x_dots) # cSpell:enable ``` ::: {#fig-replicator-dynamics-tft-ad-full-custom} ::: {.panel-sidebar} ```{ojs} viewof tft_ad_decomp_eta_1 = slider({ min: 1, max: 5, step: 1, value: 1, title: 'η₁' }) viewof tft_ad_decomp_eta_2 = slider({ min: 1, max: 5, step: 1, value: 1, title: 'η₂' }) viewof tft_ad_decomp_beta = slider({ min: 5, max: 15, step: 1, value: 5, title: 'β' }) viewof tft_ad_decomp_delta = slider({ min: 0.1, max: 0.9, step: 0.1, value: 0.1, title: 'δ' }) rep_dyn_tft_ad_decomp_index = Math.round( 2750*(tft_ad_decomp_delta-0.1)+55*(tft_ad_decomp_eta_1-1 )+11*(tft_ad_decomp_eta_2-1)+tft_ad_decomp_beta-5 ) rep_dyn_tft_ad_decomp_x_bar_var = rep_dyn_tft_ad_decomp_data[rep_dyn_tft_ad_decomp_index]['x_bar'] viewof tft_ad_decomp_x_bar = { const element = htl.html`<div>x̄=${rep_dyn_tft_ad_decomp_x_bar_var}</div>` return element } ``` ::: ::: {.panel-fill} ::: {.panel-tabset .ojs-track-layout} ## Plot ```{ojs} Plot.plot({ x: { label: "x", labelAnchor: 'center', line: true, domain: [0, 1] }, y: { label: "ẋ", labelAnchor: 'center', line: true, grid: true, }, marks: [ Plot.line(rep_dyn_tft_ad_decomp_data[rep_dyn_tft_ad_decomp_index]['data'], {x: 'x',y: 'x_dot', stroke: 'darkred'} ), Plot.ruleY([0]) ] }) ``` ## Data ```{ojs} Inputs.table(rep_dyn_tft_ad_decomp_data[rep_dyn_tft_ad_decomp_index]['data'], { columns: [ "x", "x_dot" ], header: { x: "x", x_dot: "ẋ" } }) ``` ::: ::: Replicator Dynamics for TFT against AD ::: ### Grim Trigger Vs Always Defect Suppose that AD is the incumbent strategy and Grim trigger wishes to invade. The fitness of each strategy against the average strategy $\bar{\sigma}=x\circ\sigma_{Grim}+(1-x)\circ\sigma_{D}$ is: $$\Pi(\sigma_{Grim},\bar{\sigma})=x\frac{R}{1-\delta}+(1-x)(S+\frac{\delta P}{1-\delta}) $$ $$\Pi(\sigma_{D}, \bar{\sigma})=x(T+\delta\frac{P}{1-\delta})+(1-x)(\frac{P}{1-\delta}) $$ Combining these to give the fitness of the average strategy against itself: $$\Pi(\bar{\sigma}, \bar{\sigma})=x^2\frac{R}{1-\delta}+x(1-x)(S+\frac{\delta P}{1-\delta})+(1-x)x(T+\frac{\delta P}{1-\delta})+(1-x)^2\frac{P}{1-\delta} $$ The replicator dynamics are given by: $$\dot{x}=x\bigg[x(1-x)\frac{R}{1-\delta}+(1-x)^2(S+\frac{\delta P}{1-\delta})-x(1-x)(T+\frac{\delta P}{1-\delta})-(1-x)^2\frac{P}{1-\delta}\bigg] $$ $$\dot{x}=x(1-x)[x((P-S)-(T-R)+\frac{\delta}{1-\delta}(R-P))-(P-S)] $$ This has 3 roots: $\bar{x}=0, \bar{x}=1$ and $\bar{x}=\frac{P-S}{(P-S)-(T-R)+\frac{\delta}{1-\delta}(R-P)}$ The inner solution in terms of the decomposition is: $$\bar{x}=\frac{2\eta_2}{2\eta_2-2\eta_1+\frac{\delta}{1-\delta}(-\eta_1-\eta_2+2\beta)} $$ ### Limited Punishment Strategies Vs Always Defect Suppose the number of periods of punishment is $k$, then against the average strategy of $\bar{\sigma}=x\circ\sigma_{LP}+(1-x)\circ\sigma_D$, the fitness of limited punishment is: $$\Pi(\sigma_{LP}, \bar{\sigma})=x\frac{R}{1-\delta}+(1-x)\frac{S+P\Sigma_{s=1}^{k}\delta^s}{1-\delta^{k+1}} $$ The fitness of AD against the average is: $$\Pi(\sigma_{D}, \bar{\sigma})=x\frac{T+P\Sigma_{s=1}^{k}\delta^s}{1-\delta^{k+1}}+(1-x)\frac{P}{1-\delta} $$ Therefore, the fitness of the average strategy against itself is: $$\Pi(\bar{\sigma}, \bar{\sigma})=x^2\frac{R}{1-\delta}+x(1-x)\frac{S+P\Sigma_{s=1}^{k}\delta^s}{1-\delta^{k+1}}+x(1-x)\frac{T+P\Sigma_{s=1}^{k}\delta^s}{1-\delta^{k+1}}+(1-x)^2\frac{P}{1-\delta} $$ The replicator dynamics are therefore given by: $$\dot{x}=x\bigg[x(1-x)\frac{R}{1-\delta}+(1-x)^2\frac{S+P\Sigma_{s=1}^{k}\delta^s}{1-\delta^{k+1}}-x(1-x)\frac{T+P\Sigma_{s=1}^{k}\delta^s}{1-\delta^{k+1}}-(1-x)^2\frac{P}{1-\delta}\bigg] $$ $$\dot{x}=\frac{1}{1-\delta^{k+1}}x(1-x)[x((P-S)-(T-R)+\frac{\delta(1-\delta^k)}{1-\delta}(R-P))-(P-S)] $$ The roots of this are $\bar{x}=0$, $\bar{x}=1$ and $\bar{x}=\frac{P-S}{(P-S)-(T-R)+\frac{\delta(1-\delta^k)}{1-\delta}(R-P)}$. It can easily be seen that this is the general case for TFT $(k=1)$ and Grim Trigger $(k=\infty)$. And when the decomposition is used: $$\bar{x}=\frac{2\eta_2}{2\eta_2-2\eta_1+\frac{\delta(1-\delta^k)}{1-\delta}(-\eta_1-\eta_2+2\beta)} $$ the effects of each component on this root depends on the discount factor, but the behavioural component is the only part where the discount factor affects it as a scalar. For figure see [Replicator Dynamics for Limited Punishment](file:///G:/My%20Drive/UCL%20MPhil-PhD/Decomposition/Repeated%20Games/Prisoner's%20Dilemma/Replicator%20Dynamics%20For%20Limited%20Punishment.html){.external target="_Replicator Dynamics for Limited Punishment"} ### Win-Stay Lose-Switch Vs Always Defect Next, consider the WSLS strategy against the incumbent AD. The fitness of each strategy against the average strategy $\bar{\sigma}=x\circ\sigma_{WSLS}+(1-x)\circ\sigma_D$ is the same as TFT as they follow the same on and off equilibrium paths. ### Win-Stay Lose-Switch Vs Tit-for-Tat Suppose that TFT is the incumbent strategy and WSLS wishes to invade. The fitness of each strategy against the average strategy $\bar{\sigma}=x\circ\sigma_{WSLS}+(1-x)\circ\sigma_{TFT}$ is the same, because both strategies prioritise playing C, and without any D choices, with always play C. Thus, they perform as well as each other and will not invade each other. Where this is not true is if there are a small fraction of D choices, perhaps mistakes or opportunistic choices intended to 'test the waters'. Let Tit-For-Tat be the incumbent, and assume that every period a fraction $\epsilon$ of all players will make a mistake for a period, can WSLS take advantage of the defectors and invade? Let the strategies with error be $\sigma_{TFTE}$ and $\sigma_{WSLSE}$, then the average strategy in the population is: $$\bar{\sigma}=x[(1-\epsilon)\circ\sigma_{WSLS}+\epsilon\circ\sigma_{WSLSE}]+(1-x)[(1-\epsilon)\circ\sigma_{TFT}+\epsilon\circ\sigma_{TFTE}] $$ The fitness of WSLS against the new average strategy is found by: $$\Pi(\sigma_{WSLS}, \bar{\sigma})=x\frac{R}{1-\delta}+(1-x)[(1-\epsilon)(R+\delta Q_{C,C})+\epsilon(S+\delta Q_{C,D})] $$ Here, $Q_{C,C}$ is the value of the future payoffs after a C, C outcome, when paired with a TFTE player, and likewise $Q_{C,D}$ is the value of the future payoffs after a C, D outcome, when paired with a TFTE player. We can find expressions for these terms by expanding them and looking for patterns. Take $Q_{C,C}$, the on strategy choice is to continue to play C, thus the next outcome just depends on whether the TFTE player makes a mistake. If the TFTE player plays C, the outcome will be C, C, they both get R now, and the future payoffs will be $Q_{C,C}$. If the TFTE player make a mistake and plays D, the outcome will be C,D, and the payoff will be S plus the discounted value of the payoffs after a C,D outcome $(Q_{C,D})$: $$Q_{C,C}=(1-\epsilon)(R+\delta Q_{C,C})+\epsilon(S+\delta Q_{C,D}) $$ In order to evaluate this further, we need to delve deeper into the term $Q_{C,D}$. WSLS states that after a C,D outcome one should switch to D. If one's TFTE opponent follows their strategy they will revert back to C, leading to a D,C outcome. This will give the highest payoff T for one period, but will lead to punishment from the TFTE opponent. The following outcome again depends on mistakes: D,D if they make no mistake, leading to a payoff of P; or D,C if they mistakenly still play C, leading to a high payoff of T again. The total future payoff after the former outcome can be denoted as $Q_{D,D}$, and for the later outcome it is again $Q_{C,D}$. But, if they make a mistake they will play D still. This leads to a D,D outcome with a payoff of P. Both players, if following their strategies, will switch to C, for a payoff of R. If the TFTE player make a mistake we will get C,D again, giving a payoff of S. This can all be brought together below. $$Q_{C,D}=(1-\epsilon)[T+\delta(1-\epsilon)(P+\delta Q_{D,D})+\delta\epsilon(T+\delta Q_{D,C})]+\epsilon[P+\delta(1-\epsilon)(R+\delta Q_{C,C})+\delta\epsilon(S+\delta Q_{C,D})] $$ We also know that: $$Q_{D,C}=(1-\epsilon)(P+\delta Q_{D,D})+\epsilon(T+\delta Q_{D,C}) $$ Therefore, the penultimate step is to find an expression for $Q_{D,D}$. Again, this has been states before: $$Q_{D,D}=(1-\epsilon)(R+\delta Q_{C,C})+\epsilon(S+\delta Q_{C,D}) $$ These 4 equations will provide us with solutions to substitute into the fitness equation. Immediately, the first 3 equations can be simplified as terms exist on both sides of the equation: $$Q_{C,C}=\frac{1-\epsilon}{1-\delta(1-\epsilon)}R+\frac{\epsilon}{1-\delta(1-\epsilon)}(S+\delta Q_{C,D}) $$ $$Q_{C,D}=\frac{1-\epsilon}{1-\delta^2\epsilon^2}[T+\delta(1-\epsilon)(P+\delta Q_{D,D})+\delta\epsilon(T+\delta Q_{D,C})]+\frac{\epsilon}{1-\delta^2\epsilon^2}[P+\delta(1-\epsilon)(R+\delta Q_{C,C})+\delta\epsilon S] $$ $$Q_{D,C}=\frac{1-\epsilon}{1-\delta\epsilon}(P+\delta Q_{D,D})+\frac{\epsilon}{1-\delta\epsilon}T $$ Substituting $Q_{C,C}$ reduces us to three equations: $$Q_{C,D}=\frac{1-\epsilon}{1-\delta^2\epsilon^2}[T+\delta(1-\epsilon)(P+\delta Q_{D,D})+\delta\epsilon(T+\delta Q_{D,C})]+\frac{\epsilon}{1-\delta^2\epsilon^2}[P+\delta(1-\epsilon)(R+\delta\frac{1-\epsilon}{1-\delta(1-\epsilon)}R+\delta\frac{\epsilon}{1-\delta(1-\epsilon)}(S+\delta Q_{C,D}))+\delta\epsilon S] $$ $$Q_{D,C}=\frac{1-\epsilon}{1-\delta\epsilon}(P+\delta Q_{D,D})+\frac{\epsilon}{1-\delta\epsilon}T $$ $$Q_{D,D}=(1-\epsilon)(R+\delta\frac{1-\epsilon}{1-\delta\epsilon}(P+\delta Q_{D,D})+\delta\frac{\epsilon}{1-\delta\epsilon}T)+\epsilon(S+\delta Q_{C,D}) $$ Simplify $Q_{D,D}$ such that $Q_{D,D}$ is only on the left of the equation: $$Q_{D,D}=\frac{(1-\epsilon)(1-\delta\epsilon)}{1-\delta\epsilon+\delta^2(1-\epsilon)^2}(R+\delta\frac{1-\epsilon}{1-\delta\epsilon}P+\delta\frac{\epsilon}{1-\delta\epsilon}T)+\frac{\epsilon(1-\delta\epsilon)}{1-\delta\epsilon+\delta^2(1-\epsilon)^2}(S+\delta Q_{C,D}) $$ Now substitute in $Q_{D,C}$ to leave us with two equations: $$Q_{C,D}=\frac{1-\epsilon}{1-\delta^2\epsilon^2}[T+\delta(1-\epsilon)(P+\delta Q_{D,D})+\delta\epsilon(T+\delta\frac{1-\epsilon}{1-\delta\epsilon}(P+\delta Q_{D,D})+\delta\frac{\epsilon}{1-\delta\epsilon}T)]+\frac{\epsilon}{1-\delta^2\epsilon^2}[P+\delta(1-\epsilon)(R+\delta\frac{1-\epsilon}{1-\delta(1-\epsilon)}R+\delta\frac{\epsilon}{1-\delta(1-\epsilon)}(S+\delta Q_{C,D}))+\delta\epsilon S] $$ $$Q_{D,D}=\frac{(1-\epsilon)(1-\delta\epsilon)}{1-\delta\epsilon+\delta^2(1-\epsilon)^2}(R+\delta\frac{1-\epsilon}{1-\delta\epsilon}P+\delta\frac{\epsilon}{1-\delta\epsilon}T)+\frac{\epsilon(1-\delta\epsilon)}{1-\delta\epsilon+\delta^2(1-\epsilon)^2}(S+\delta Q_{C,D}) $$ One can then simplify $Q_{C,D}$ and substitute in for $Q_{D,D}$: $$Q_{C,D}=\frac{(1-\epsilon)(1-\delta(1-\epsilon))}{(1-\delta^2\epsilon^2-\delta(1-\epsilon))}[T+\delta(1-\epsilon)P+\frac{\delta^2(1-\epsilon)}{1-\delta\epsilon}Q_{D,D}+\delta\epsilon(T+\delta\frac{1-\epsilon}{1-\delta\epsilon}P+\delta\frac{\epsilon}{1-\delta\epsilon}T)]+\frac{\epsilon(1-\delta(1-\epsilon))}{(1-\delta^2\epsilon^2-\delta(1-\epsilon))}[P+\delta(1-\epsilon)(R+\delta\frac{1-\epsilon}{1-\delta(1-\epsilon)}R+\delta\frac{\epsilon}{1-\delta(1-\epsilon)}S)+\delta\epsilon S] $$ Substituting in for $Q_{D,D}$: $$Q_{C,D}=\frac{(1-\epsilon)(1-\delta(1-\epsilon))}{(1-\delta^2\epsilon^2-\delta(1-\epsilon))}[T+\delta(1-\epsilon)P+\frac{\delta^2(1-\epsilon)}{1-\delta\epsilon}\frac{(1-\epsilon)(1-\delta\epsilon)}{1-\delta\epsilon+\delta^2(1-\epsilon)^2}(R+\delta\frac{1-\epsilon}{1-\delta\epsilon}P+\delta\frac{\epsilon}{1-\delta\epsilon}T)+\frac{\delta^2(1-\epsilon)}{1-\delta\epsilon}\frac{\epsilon(1-\delta\epsilon)}{1-\delta\epsilon+\delta^2(1-\epsilon)^2}(S+\delta Q_{C,D})+\delta\epsilon(T+\delta\frac{1-\epsilon}{1-\delta\epsilon}P+\delta\frac{\epsilon}{1-\delta\epsilon}T)]+\frac{\epsilon(1-\delta(1-\epsilon))}{(1-\delta^2\epsilon^2-\delta(1-\epsilon))}[P+\delta(1-\epsilon)(R+\delta\frac{1-\epsilon}{1-\delta(1-\epsilon)}R+\delta\frac{\epsilon}{1-\delta(1-\epsilon)}S)+\delta\epsilon S] $$ $$Q_{C,D}=\frac{(1-\delta(1-\epsilon))[(1-\epsilon)(1-\delta\epsilon+2\delta^2(1-\epsilon)^2)T+((1-\delta\epsilon+\delta^2(1-\epsilon)^2)(\delta(1-\epsilon)+\epsilon(1-\delta))+\delta^3(1-\epsilon)^4)P]}{(1-\delta\epsilon)[(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))-\delta^3\epsilon(1-\epsilon)^2(1-\delta(1-\epsilon))]}+\frac{\delta^2(1-\epsilon)^2+\delta\epsilon(1-\epsilon)(1-\delta\epsilon)-\delta^3(1-\epsilon)^4}{(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))-\delta^3\epsilon(1-\epsilon)^2(1-\delta(1-\epsilon))}R+\frac{\delta^3\epsilon(1-\epsilon)^2(1-2\epsilon)-\delta^2\epsilon(2\epsilon-1)+\delta\epsilon^2}{(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))-\delta^3\epsilon(1-\epsilon)^2(1-\delta(1-\epsilon))}S $$ Substituting in to get $Q_{C,C}$: $$Q_{C,C}=\frac{1-\epsilon}{1-\delta(1-\epsilon)}R+\frac{\epsilon}{1-\delta(1-\epsilon)}S+\frac{\delta\epsilon}{1-\delta(1-\epsilon)}Q_{C,D} $$ $$Q_{C,C}=\frac{\delta\epsilon}{1-\delta\epsilon}\frac{(1-\epsilon)(1-\delta\epsilon+2\delta^2(1-\epsilon)^2)T+((1-\delta\epsilon+\delta^2(1-\epsilon)^2)(\delta(1-\epsilon)+\epsilon(1-\delta))+\delta^3(1-\epsilon)^4)P}{(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))-\delta^3\epsilon(1-\epsilon)^2(1-\delta(1-\epsilon))}+\frac{\delta(1_\epsilon)(1-\delta(1-\epsilon))(\delta(1-\epsilon)+\epsilon(1-\delta\epsilon)-\delta^2(1-\epsilon)^2)+(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\epsilon-\delta^2\epsilon^2(1-\epsilon)+\delta(1-\epsilon)^2)}{(1-\delta(1-\epsilon))[(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))-\delta^3\epsilon(1-\epsilon)^2(1-\delta(1-\epsilon))]}R+\frac{\epsilon(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))+\delta\epsilon(1-\delta(1-\epsilon))(\delta^2(1-\epsilon)^2(1-3\epsilon)-\delta(2\epsilon-1)+\epsilon)}{(1-\delta(1-\epsilon))[(1-\delta\epsilon+\delta^2(1-\epsilon)^2)(1-\delta^2\epsilon^2-\delta(1-\epsilon))-\delta^3\epsilon(1-\epsilon)^2(1-\delta(1-\epsilon))]}S $$ </body>

1 Repeated Games

2 Decomposition

3 Infinitely Repeated Games

4 Nash Equilibria

4.1 Always Defect

4.2 Grim Trigger

4.3 Tit-for-Tat (TFT)

4.4 General Limited Punishment Strategies

4.5 Win-Stay Lose-Switch (Pavlov)

5 Subgame Perfect Equilibrium

5.1 Grim Trigger

5.2 TFT

5.3 Limited Punishment

5.4 WSLS

6 Summary of NE and SPE

7 Graphing the Critical Values of \(\delta\)

8 Using the Decomposition

8.1 Grim Trigger

8.2 Tit-for-Tat

8.3 Limited Punishment

8.4 Win-Stay Lose-Switch

8.5 Graphing the critical values as the decomposition changes

9 Competing Strategies

9.1 Tit-for-Tat Vs Always Defect

9.2 Grim Trigger Vs Always Defect

9.3 Limited Punishment Strategies Vs Always Defect

9.4 Win-Stay Lose-Switch Vs Always Defect

9.5 Win-Stay Lose-Switch Vs Tit-for-Tat

References